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ON APPEAL TO THE BOARD OF PATENT APPEALS AND INTERFERENCES 

APPELLANTS' BRIEF 



MAIL STOP APPEAL BRIEF - PATENTS 

Commissioner for Patents 
P.O. Box 1450 

Alexandria, Virginia 22313-1450 
Dear Sir: 

This Appeal Brief, filed in connection with the above captioned patent application, is 
responsive to the Final Office Action mailed on October 20, 2004. A Notice of Appeal was filed 
herein on January 20, 2005. This brief is timely filed since August 20, 2005 is a Saturday. A 
request for a five month extension of time is filed concurrently herewith. Appellants hereby 
appeal to the Board of Patent Appeals and Interferences from the final rejection in this case. 

The Commissioner is authorized to charge any fees which may be required, including 
extension fees, or credit any overpayment to Deposit Account No. 08-1641 (referencing 
Attorney's Docket No. 39780-2730 P1C29 . 



1 



The following constitutes the Appellants' Brief on Appeal. 

I. REAL PARTY IN INTEREST 

The real party in interest is Genentech, Inc., South San Francisco, California, by an 
assignment of the parent application, U.S. Serial No. 09/941,992 recorded November 16, 2001, 
at Reel 012176 and Frame 0450. 

II. RELATED APPEALS AND INTERFERENCES 

The claims pending in the current application are directed to a polypeptide referred to 
herein as "PRO1097". There exist two related patent applications, (1) U.S. Serial No. 
09/997,628, filed November 15, 2001 (containing claims directed to antibodies to the PRO1097 
polypeptide), and (2) U.S. Serial No. 09/989,723, filed November 19, 2001 (containing claims 
directed to nucleic acids encoding PRO 1097 polypeptides). These two related applications are 
also under final rejection from the same Examiner and based upon the same outstanding 
rejection, therefore appeal of these final rejections are being pursued independently and 
concurrently herewith. 

III. STATUS OF CLAIMS 

Claims 119-126 and 129-131 are in this application. 
Claims 1-118 and 127-128 have been canceled. 

Claims 1 19-126 and 129-131 stand rejected and Appellants appeal the rejection of these 

claims. 

A copy of the rejected claims in the present Appeal is provided as Appendix A. 

IV. STATUS OF AMENDMENTS 

In an Amendment filed on March 10, 2005 after the mailing of the Final Office of 
October 1 8, 2004, a request under Rule C.F.R. § 1 .48 for correction of inventorship was filed, and 
this amendment was entered for purposes of this appeal. 
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V. SUMMARY OF CLAIMED SUBJECT MATTER 

The invention claimed in the present application is related to an isolated polypeptide 
comprising the amino acid sequence of the polypeptide of SEQ ID NO: 349, referred to in the 
present application as "PRO1097." The PRO1097 gene was shown for the first time in the 
present application to be significantly amplified in human lung or colon cancers as compared to 
normal, non-cancerous human tissue controls (Example 170). This feature is specifically recited 
in claim 124, and carried by all claims dependent from claim 124. In addition, the invention also 
claims the amino acid sequence of the polypeptide of SEQ ID NO: 349, lacking its associated 
signal-peptide; or the amino acid sequence of the polypeptide encoded by the full-length coding 
sequence of the cDNA deposited under ATCC accession number 203044 (Claim 124-126 and 
129). The invention is further directed to polypeptides having at least 80%, 85%, 90%, 95% or 
99% amino acid sequence identity to the amino acid sequence of the polypeptide of SEQ ID NO: 
349; the amino acid sequence of the polypeptide of SEQ ID NO: 349, lacking its associated 
signal peptide; or the amino acid sequence of the polypeptide encoded by the full-length coding 
sequence of the cDNA deposited under ATCC accession number 203044, wherein the nucleic 
acid encoding said polypeptide is amplified in lung or colon tumor (Claims 1 19-123). The 
invention is further directed to a chimeric polypeptide comprising one of the above polypeptides 
fused to a heterologous polypeptide (Claim 130), and to a chimeric polypeptide wherein the 
heterologous polypeptide is an epitope tag or an Fc region of an immunoglobulin (Claim 131). 

The amino acid sequence of the native "PRO 1097" polypeptide and the nucleic acid 
sequence encoding this polypeptide (referred to in the present application as "DNA5984 1-1460") 
are shown in the present specification as SEQ ID NOs: 349 and 348, respectively, and in Figures 
244 and 243, respectively found on pages 299, lines 30-34. A full-length PRO1097 polypeptide 
having the amino acid sequence of SEQ ID NO:349 is described in the specification at, for 
example, on pages 218-220, line 30 onwards and the isolation of cDNA clones encoding 
PRO1097 of SEQ ID NO:349 is described in Example 107, page 489 of the specification. The 
specification discloses that various portions of the PRO 1097 polypeptide possess significant 
sequence similarity to the glycoportease family of proteins and the acyltransferase 
ChoActase/COT/CPT family (see, for example, page 218, lines 31-34). 

PRO polypeptide variants having at least about 80-99% amino acid sequence identity 
with a full length PRO polypeptide sequence, or a PRO polypeptide sequence lacking the signal 

3 



■I l> 



peptide are described in the specification at, for example, page 305, line 23 onwards, and percent 
amino acid sequence identity determination is described at, for example, pages 306-308, line 14 
onwards. The preparation of chimeric PRO polypeptides (claims 130 and 131), including those 
wherein the heterologous polypeptide is an epitope tag or an Fc region of an immunoglobulin, is 
set forth in the specification at page 374, lines 24 to page 375, line 9. Examples 140-143 and 
page 376, line 12 onwards describe the expression of PRO polypeptides in various host cells, 
including E. coli, mammalian cells, yeast and Baculo virus-infected insect cells. 

Finally, Example 170, in the specification at page 539, line 19, to page 555, line 5, sets 
forth a 'Gene Amplification assay' which shows that the PRO 1097 gene is amplified in the 
genome of certain human lung or colon cancers (see Table 9, page 550). The profiles of various 
primary lung and colon tumors used for screening the PRO polypeptide compounds of the 
invention in the gene amplification assay are summarized on Table 8, page 546 of the 
specification. 

VI. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

1 . Whether Claims 119-126 and 129-131 should be accorded priority of provisional 
Application 60/141,037, filed 19 November, 2001. 

2 . Whether Claims 1 1 9- 1 26 and 1 29- 1 3 1 satisfy the utility/ enablement requirement 
under 35 U.S.C. §101/112, first paragraph. 

3 . Whether Claims 119-126 and 129-131 satisfy the written description requirement 
under 35 U.S.C. §112, first paragraph. 

VII. GROUPING OF CLAIMS 

With respect to Issue 1, all claims (Claims 119-126 and 129-131) stand and fall together. 
With respect to Issue 2, all claims (Claims 119-126 and 129-131) stand and fall together. 
Issue 3 concerns only Claims 1 19-126, which claims stand and fall together. 
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VIII. ARGUMENTS 

Summary of the Arguments 

Issue 1 : Priority 

The instant application has not been granted the earlier priority date on the grounds that 
"although disclosing the same experimental assays as the instant specification, do not enable the 
instant invention and therefore do not impart utility. . ." 

Appellants submit that data derived from the Gene Amplification assay was first 
disclosed in U. S. Application Serial No. 60/141,037, filed 19 November, 2001 for the gene 
encoding the claimed PRO 1097 polypeptide. Appellants further submit that, the same detailed 
reasons discussed below under the section on Issue II: Utility/ Enablement, are sufficient to also 
establish patentable utility for U. S. Application Serial No. 60/141,037. Hence, Appellants 
should be able to rely upon this provisional application to provide an effective filing date of 19 
November, 2001 for the instant application. 

Issue 2: Utility/ Enablement 

Claims 1 19-126 and 129-131 stand rejected under 35 U.S.C. §101/ 1 12, first paragraph as 
allegedly lacking either a specific and substantial asserted utility or a well established utility. 
Appellants have previously submitted that patentable utility for the PRO 1097 polypeptides is 
based upon the gene amplification data for the gene encoding the PRO 1097 polypeptide. The 
specification discloses that the gene encoding PRO 1097 showed significant amplification, 
ranging from 2.313 to 2.346 fold in two different lung primary tumors and 2.114 to 2.532 fold in 
three different colon primary tumors . Therefore, such a gene is useful as a marker for the 
diagnosis of cancer , and for monitoring cancer development and/or for measuring the efficacy of 
cancer therapy. 

In the first Office action mailed May 3, 2004, the Examiner cited references Hittelman et 
ah and Crowell et ah, to show that "a slight increase in clone numbers in a cancerous tissue is no 
doubt due to an increased number of chromosomes, a very common characteristic of cancerous 
and non-cancerous epithelial cells." Appellants submit that, in fact, the Hittelman reference 
supports the Appellants position that there is utility in identifying genetic biomarkers in 
epithelial tissues at cancer risk (see Hittelman, abstract, line 4-7). 
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The Examiner further cited references Skolnick et al, Bork et al, Doerks et al, 
Hesselgesser et al. and Blease et al. to show that "function cannot be predicted based solely on 
structural similarity to a protein found in sequence databases." Appellants had argued in their 
response of August 3, 2004 that Appellants assertion for utility of PRO1097 was not based on 
structural similarity. 

The Examiner further asserted on page 3 of the Final Office Action mailed October 18, 
2004 that amplification of the PRO1097 polynucleotide does not impart a specific, substantial, 
and credible utility to the PRO 1097 polypeptide since, "there is no evidence regarding whether 
or not PRO1097 mRNA or polypeptide levels are also increased in (these) cancer." In support of 
this assertion, the Examiner cited references by Pennica et al, Haynes et al. and Hu et al. 

Appellants submit that, the teachings of Pennica et al. are not directed towards genes in 
general but to a single gene or genes within a single family and thus, their teachings cannot 
support a general conclusion regarding correlation between gene amplification and mRNA or 
protein levels. Further, Appellants submit that the teachings of Haynes et al. in fact, meets the 
"more likely than not standard" and shows that a positive correlation exists between mRNA and 
protein. And based on the nature of the statistical analysis performed in one class of genes in Hu 
et al., the Examiner's conclusions are not reliably supported . Thus, Appellants submit that these 
references do not conclusively establish a prima facie case for lack of utility. 

In contrast, Appellants have submitted ample evidence to show that, in general, if a gene 
is amplified in cancer, it is more likely than not that the encoded protein will be expressed at an 
elevated level. First, the articles by Orntoft et al, Hyman et al, and Pollack et al. (made of 
record in Appellants' Response filed July 7, 2004) collectively teach that in general, gene 
amplification increases mRNA expression . Second, the Declaration of Dr. Paul Polakis (made of 
record in Appellants' Response filed August 3, 2004), principal investigator of the Tumor 
Antigen Project of Genentech, Inc., the assignee of the present application, shows that, in 
general, there is a correlation between mRNA levels and polypeptide levels . Appellants further 
note that the sale of gene expression chips to measure mRNA levels is a highly successful 
business, with a company such as Affymetrix recording 168.3 million dollars in sales of their 
GeneChip arrays in 2004. Clearly, the research community believes that the information 
obtained from these chips is useful (i.e., that it is more likely than not informative of the protein 
level). 




Taken together, although there are some examples in the scientific art that do not fit 
within the central dogma of molecular biology that there is a correlation between DNA, mRNA, 
and polypeptide levels, these instances are exceptions rather than the rule . In the majority of 
amplified genes , as exemplified by Oratoft et al, Hyman et al, Pollack et al, the Polakis 
Declaration and the widespread use of array chips, the teachings in the art overwhelmingly show 
that gene amplification influences gene expression at the mRNA and protein levels . Therefore, 
one of skill in the art would reasonably expect in this instance, based on the amplification data 
for the PRO1097 gene, that the PRO1097 polypeptide is concomitantly overexpressed. Thus, the 
claimed PRO 1097 polypeptides also have utility in the diagnosis of cancer. 

Appellants further submit that even if there is no correlation between gene amplification 
and increased mRNA/protein expression, (which Appellants expressly do not concede), a 
polypeptide encoded by a gene that is amplified in cancer would still have a specific, substantial, 
and credible utility. Appellants submit that, as evidenced by the Ashkenazi Declaration and the 
teachings of Hanna and Mornin (both made of record in Appellants' Response filed August 3, 
2004), simultaneous testing of gene amplification and gene product over-expression enables 
more accurate tumor classification , even if the gene-product, the protein, is not over-expressed. 
This leads to better determination of a suitable therapy for the tumor, as demonstrated by a real- 
world example of the breast cancer marker HER-2/neu. Accordingly, Appellants submit that 
when the proper legal standard is applied, one should reach the conclusion that the present 
application discloses at least one patentable utility for the claimed PRO 1097 polypeptides. 

The Examiner also cited references Smith et al, and Brenner et al, to support an 
enablement rejection that "it is not predictable which amino acids are necessary to maintain the 
functional characteristics of protein". 

Appellants submit that, besides the detailed protocol for the gene amplification assay, the 
specification further provides ample guidance to allow the skilled artisan to identify those 
polypeptides which meet the limitations of the claims, including, how to identify polypeptides 
based on % identity to (SEQ ID NO: 349), how to make PRO 1097 polypeptides, etc. Prediction 
of the amino acid(s) necessary for functionality is not necessary to practice the invention. 
Appellants further submit that, based on the gene amplification data and the substantial, credible, 
asserted utility of PRO 1097 polypeptides in the diagnosis of lung or colon cancer, one of 
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ordinary skill would know exactly how to make and use these claimed polypeptides for the 
diagnosis of cancers, without any undue experimentation. 

Issue 3: Written Description 

Claims 119-126 and 129-131 stand rejected under 35 U.S.C. §1 12, first paragraph, 
allegedly "as containing subject matter which was not described in the specification in such a 
way as to reasonably convey to one skilled in the relevant art that the inventor(s), at the time the 
application was filed, had possession of the claimed invention.." (Page 9 of the Final Office 
Action mailed October 1 8, 2004). 

The factors to be considered in evidencing possession of a claimed genus include 
"disclosure of complete or partial structure, physical and/or chemical properties, functional 
characteristics, structure/function correlation, methods of making the claimed product, or any 
combination thereof." Appellants note that the claims recite structural features, namely, 80% 
sequence identity to SEQ ID NO:349, which are common to the genus. The genus of claimed 
polypeptides is further defined by having a specific functional activity for the encoding nucleic 
acids, namely, that the encoding nucleic acid is amplified in lung or colon tumors. The 
specification provides detailed guidance as to how to identify polypeptides having at least 80% 
amino acid sequence identity to SEQ ID NO:349 (PRO1097), as well as detailed protocols for 
determining whether a gene encoding a variant PRO 1097 protein is amplified in lung or colon or 
tumor. Thus one of skill in the art could easily identify whether a variant PRO 1097 sequence 
falls within the parameters of the claimed invention. 

Accordingly, a description of the claimed genus has been achieved by the recitation of 
both structural and functional characteristics. 

Response to Rejections 

ISSUE 1. U.S. Provisional Application No. 60/141,037 Satisfies the Utility Requirement of 
35 U.S.C. § 101/ § 112, First Paragraph based on the results of the Gene Amplification assay 

Appellants have asserted that U.S. Provisional Application No. 60/141,037, filed 
November 19, 2001, discloses the gene amplification assay (shown in Example 170 of the instant 
specification) and establishes patentable utility for the claimed PRO1097 polypeptides. 
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Appellants submit, for the reasons set forth below under Issue 2 for Utility/ Enablement, 
that the results of the gene amplification assay disclosed in the specification of U.S. Application 
No. 60/141,037, provides at least one credible, substantial and specific asserted utility for the 
claimed PRO1097 polypeptides under 35 U.S.C. §101/§1 12, first paragraph. Accordingly, 
Appellants respectfully request that the subject matter of the instant claims be granted the 
November 19, 2001, priority date of U.S. Provisional Application No. 60/141,037. 

ISSUE 2. Claims 119-126 and 129-131 are supported by a credible, specific and substantial 
asserted utility, and thus meet the utility requirement of 35 U.S.C. $ 101/ 112, first 
paragraph 

The sole basis for the Examiner's rejection of Claims 1 19-126 and 129-131 under this 
section is that the data presented in Example 170 of the present specification is allegedly 
insufficient under the present legal standards to establish a patentable utility under 35 U.S.C. § 
101 for the presently claimed subject matter. 

Claims 119-126 and 129-131 stand further rejected under 35 U.S.C. §112, first paragraph, 
allegedly "since the claimed invention is not supported by either a specific and substantial asserted 
utility or a well established utility for the reasons set forth above, one skilled in the art clearly would 
not know how to use the claimed invention." 

Appellants strongly disagree and, therefore, respectfully traverse the rejection. 

A. The Legal Standard For Utility Under 35 U.S.C. $ 101 

According to 35 U.S.C. § 101: 

Whoever invents or discovers any new and useful process, machine, manufacture, or 
composition of matter, or any new and useful improvement thereof, may obtain a patent 
therefor, subject to the conditions and requirements of this title. (Emphasis added.) 

In interpreting the utility requirement, in Brenner v. Manson the Supreme Court held 
that the quid pro quo contemplated by the US: Constitution between the public interest and the 
interest of the inventors required that a patent applicant disclose a "substantial utility" for his or 



1 Brenner v. Manson, 383 U.S. 519, 148 U.S.P.Q. (BNA) 689 (1966). 
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her invention, i.e. a utility "where specific benefit exists in currently available form." The Court 
concluded that "a patent is not a hunting license. It is not a reward for the search, but 
compensation for its successful conclusion. A patent system must be related to the world of 

3 

commerce rather than the realm of philosophy." 

4 

Later, in Nelson v. Bowler the C.C.P.A. acknowledged that tests evidencing 
pharmacological activity of a compound may establish practical utility, even though they may 
not establish a specific therapeutic use. The court held that "since it is crucial to provide 
researchers with an incentive to disclose pharmaceutical activities in as many compounds as 
possible, we conclude adequate proof of any such activity constitutes a showing of practical 

utility." 5 

In Cross v. Iizuka the C.A.F.C. reaffirmed Nelson, and added that in vitro results might 
be sufficient to support practical utility, explaining that "in. vitro testing, in general, is relatively 
less complex, less time consuming, and less expensive than in vivo testing. Moreover, in vitro 
results with the particular pharmacological activity are generally predictive of in vivo test results, 
i.e. there is a reasonable correlation there between." 7 The court perceived "No insurmountable 
difficulty" in finding that, under appropriate circumstances, "in vitro testing, may establish a 

8 

practical utility." 



2 Id. at 534, 148 U.S.P.Q. (BNA) at 695. 

3 Id. at 536, 148 U.S.P.Q. (BNA) at 696. 

4 Nelson v. Bowler, 626 F.2d 853, 206 U.S.P.Q. (BNA) 881 (C.C.P.A. 1980). 

5 Id. at 856, 206 U.S.P.Q. (BNA) at 883. 

6 Cross v. Iizuka, 753 F.2d 1047, 224 U.S.P.Q. (BNA) 739 (Fed. Cir. 1985). 

7 Id. at 1050, 224 U.S.P.Q. (BNA) at 747. 
"id. 
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The case law has also clearly established that Appellants' statements of utility are usually 

9 ... 
sufficient, unless such statement of utility is unbelievable on its face. The PTO has the initial 

10 

burden to prove that Appellants' claims of usefulness are not believable on their face. In 
general, an Applicant's assertion of utility creates a presumption of utility that will be sufficient 
to satisfy the utility requirement of 35 U.S.C. §101, "unless there is a reason for one skilled in 

11 12 

the art to question the objective truth of the statement of utility or its scope." ' 

Compliance with 35 U.S.C. §101 is a question of fact. 13 The evidentiary standard to be 
used throughout ex parte examination in setting forth a rejection is a preponderance of the 

14 

totality of the evidence under consideration. Thus, to overcome the presumption of truth that 
an assertion of utility by the applicant enjoys, the Examiner must establish that it is more likely 
than not that one of ordinary skill in the art would doubt the truth of the statement of utility. 
Only after the Examiner made a proper prima facie showing of lack of utility, does the burden of 
rebuttal shift to the applicant. The issue will then be decided on the totality of evidence. 

The well established case law is clearly reflected in the Utility Examination Guidelines 

("Utility Guidelines") 15 , which acknowledge that an invention complies with the utility 
requirement of 35 U.S.C. §101, if it has at least one asserted "specific, substantial, and credible 
utility" or a "well-established utility." Under the Utility Guidelines, a utility is "specific" when 

9 In re Gazave, 379 F.2d 973, 154 U.S.P.Q. (BNA) 92 (C.C.P.A. 1967). 

10 Ibid. 

11 In reLanger, 503 F.2d 1380,1391, 183 U.S.P.Q. (BNA) 288, 297 (C.C.P.A. 1974). 

12 See also In reJolles, 628 F.2d 1322, 206 USPQ 885 (C.C.P.A. 1980); In re Irons, 340 
F.2d 974, 144 USPQ 351 (1965); In re Sichert, 566 F.2d 1154, 1159, 196 USPQ 209, 212-13 
(C.C.P.A. 1977). 

13 Raytheon v. Roper, 724 F.2d 951, 956, 220 U.S.P.Q. (BNA) 592, 596 (Fed. Cir. 1983) 
cert, denied, 469 US 835 (1984). 

14 In re Oetiker, 977 F.2d 1443, 1445, 24 U.S.P.Q.2d (BNA) 1443, 1444 (Fed. Cir. 

1992). 

15 66 Fed. Reg. 1092(2001). 
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it is particular to the subject matter claimed. For example, it is generally not enough to state that 
a nucleic acid is useful as a diagnostic without also identifying the conditions that are to be 
diagnosed. 

In explaining the "substantial utility" standard, M.P.E.P. §2107.01 cautions, however, 
that Office personnel must be careful not to interpret the phrase "immediate benefit to the 
public" or similar formulations used in certain court decisions to mean that products or services 
based on the claimed invention must be "currently available" to the public in order to satisfy the 
utility requirement. "Rather, any reasonable use that an applicant has identified for the invention 
that can be viewed as providing a public benefit should be accepted as sufficient, at least with 

regard to defining a 'substantial' utility."' 6 Indeed, the 'Guidelines for Examination of 

n 

Applications for Compliance With the Utility Requirement, gives the following instruction to 
patent examiners: "If the applicant has asserted that the claimed invention is useful for any 
particular practical purpose . . . and the assertion would be considered credible by a person of 
ordinary skill in the art, do not impose a rejection based on lack of utility." 

B. Proper Application of the Legal Standard 

Appellants respectfully submit that the data presented in Example 170 starting on page 
539 of the specification of the specification and the cumulative evidence of record, which 
underlies the current dispute, indeed support a "specific, substantial and credible" asserted utility 
for the presently claimed invention. 

Example 170 describes the results obtained using a very well-known and routinely 
employed polymerase chain reaction (PCR)-based assay, the TaqMan™ PCR assay, also referred 
to herein as the gene amplification assay. This assay allows one to quantitatively measure the 
level of gene amplification in a given sample, say, a tumor extract, or a cell line. It was well 
known in the art at the time the invention was made that gene amplification is an essential 
mechanism for oncogene activation. Appellants isolated genomic DNA from a variety of primary 
cancers and cancer cell lines that are listed in Table 9 (pages 539 onwards of the specification), 

16 M.P.E.P. §2107.01. 

17 M.P.E.P. §2107 II (B)(1). 
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including primary lung and colon cancers of the type and stage indicated in Table 8 (page 546). 
The tumor samples were tested in triplicates with Taqman™ primers and with internal controls, 
beta-actin and GADPH in order to quantitatively compare DNA levels between samples (page 
548, lines 33-34). As a negative control, DNA was isolated from the cells of ten normal healthy 
individuals, which was pooled and used as a control (page 539, lines 27-29) and also, no- 
template controls (page 548, lines 33-34). The results of TaqMan™ PCR are reported in ACt 
units, as explained in the passage on page 539, lines 37-39. One unit corresponds to one PCR 
cycle or approximately a 2-fold amplification, relative to control, two units correspond to 4-fold, 
3 units to 8-fold amplification and so on . Using this PCR-based assay, Appellants showed that 
the gene encoding for PRO 1097 was amplified, that is, it showed approximately 1.21-1.23 ACt 

1 21 1 23 

units for lung tumors and 1 .08-1 .34 ACt units for colon tumors which corresponds to 2 -2 - 
fold amplification in lung and 2 1 08 -2 1 34 - fold amplification in colon tumors respectively, or 
2.313 to 2.346 fold in two different lung primary tumors and 2.114 to 2.532 fold in three 
different colon primary tumors . 

The Examiner acknowledged that there was an "increase" in DNA, but stated that the 
increase was "slight" or "small". In fact, based on Hittelman et al, the Examiner stated that such 
a "slight increase in clone numbers in cancerous tissue is no doubt due to an increased number of 
chromosomes, a very common characteristic of cancerous and non-cancerous epithelial cells." 
Appellants disagree. 

Hittelman studied premalignant lesions and suggests that epithelial tumors develop 
through a multistep process driven by genetic instability (see abstract). Hittelman showed that a 
subset of the same molecular changes found in associated tumor were also found in premalignant 
lesions, suggesting that these premalignant lesions might represent precursor lesions for 
associated tumors, i.e., a manifestation of a multistep tumorigenesis process. (See Hittelman, 
page 4, last three lines). Appellants therefore submit that, contrary to the Examiner's rejection, 
the Hittelman reference strongly supports the Appellants position that there is utility in 
identifying genetic biomarkers in epithelial tissues at cancer risk (also see Hittelman, abstract, 
line 4-7). Hittelman adds on page 2, fourth paragraph, line 3 that "it is important to identify 
individuals at significantly increased cancer risk who might best benefit from different types of 
intervention". Taken together, even if Appellants were to show that the observed PRO 1097 gene 
amplification were due to chromosomal aneuploidy (which Appellants do not contend to), 
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identifying genetic biomarkers like the PRO 1097 gene with this aneuploidy is a very important 
and useful step, according to Hittelman, in identifying individuals at significantly increased 
cancer risk. Therefore, Hittelman supports at least one utility for the PRO 1097 gene, that is, as a 
genetic biomarker for cancer or precancerous cells. As one skilled in the art would clearly know, 
early detection of lung cancer provides information in advance about risk assessment, prognosis 
and therapy for lung cancer. 

As evidence that the "increase in DNA" in the gene amplification assay is significant, 
Appellants submit a Declaration by Dr. Audrey Goddard (copy enclosed herewith). The 
Declaration by Dr. Audrey Goddard provides a statement by an expert in the relevant art that 
"fold amplification" values of at least 2-fold are considered significant in the TaqMan™ PCR 
gene amplification assay. This Declaration is necessary at this time to counter the assertion that 
the gene amplification data does not have utility. The issue whether the fold increase in the gene 
amplification assay for the PRO1097 gene was "significant" was not raised in the First Office 
action mailed May 3, 2004 nor in the Final Office action mailed October 18, 2004. Therefore, 
this declaration addressing "significance" was not presented earlier since the Appellants had no 
opportunity or reason to address this issue until now. Thus good and sufficient reasons exist why 
this Declaration is necessary and was not earlier presented. Appellants therefore submit that 
entry of the Goddard Declaration is appropriate at this time and respectfully request that it be 
considered.. 

Appellants particularly draw the Board's attention to page 3 of the Goddard Declaration 
which clearly states that: 

It is further my considered scientific opinion that an at least 2-fold increase in gene copy 
number in a tumor tissue sample relative to a normal (i.e., non-tumor) sample is 
significant and useful in that the detected increase in gene copy number in the tumor 
sample relative to the normal sample serves as a basis for using relative gene copy 
number as quantitated by the TaqMan PCR technique as a diagnostic marker for the 
presence or absence of tumor in a tissue sample of unknown pathology. Accordingly, a 
gene identified as being amplified at least 2-fold by the quantitative TaqMan PCR assay 
in a tumor sample relative to a normal sample is useful as a marker for the diagnosis of 
cancer, for monitoring cancer development and/or for measuring the efficacy of cancer 
therapy. (Emphasis added). 
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Accordingly, the 2.313 to 2.346 fold in two different lung primary tumors and 2.1 14 to 2.532 
fold in three different colon primary tumors would be considered significant and credible by one 
skilled in the art, based upon the facts disclosed in the Goddard Declaration. 

Further Appellants submit that the fact that two lung tumor samples and three colon 
tumor samples tested positive in this study does not make the gene amplification data, by any 
means, less significant or spurious. As any skilled artisan in the field of oncology would easily 
appreciate, not all tumor markers are generally associated with every tumor, or even, with most 
tumors. In fact, some tumor markers are useful for identifying rare malignancies . That is, the 
association of the tumor marker with a particular type of tumor lesion may be rare, or, the 
occurrence of that particular kind of tumor lesion itself may be rare. In either event, even these 
rare tumor markers, which may not give a positive hit with most common tumors, have great 
value in tumor diagnosis, and consequently, in tumor prognosis . The skilled artisan would know 
that such tumor markers are very useful for better classification of tumors. Therefore, whether 
the PRO 1097 gene is amplified in two lung/ three colon tumors or in most tumors is not relevant 
to its identification as a tumor marker, or its patentable utility. Rather, whether the 
amplification data for PRO 1097 is significant is what lends support to its usefulness as a tumor 
marker. It was well known in the art at the time of filing of the application that gene 
amplification, which occurs in most solid tumors like lung and colon cancers, is generally 
associated with poor prognosis. Therefore, the PRO1097 gene becomes an important diagnostic 
marker to identify such malignant lung or colon cancers, even if the malignancy associated with 
PRO1097 molecule is a rare occurrence . Accordingly, the present specification clearly discloses 
enough evidence that the gene encoding the PRO 1097 polypeptide is significantly amplified in 
certain types of lung or colon tumors and is therefore, a valuable diagnostic marker for 
identifying certain types of lung or colon cancers. 

On page 4 of the final Office Action, the Examiner points out that "there is no evidence 

regarding whether or not PRO 1097 mRNA or polypeptide levels are also increased in this 

cancer". The Examiner points out, especially on page 4-5 of the Final Office Action mailed on 

October 18, 2004, that: 

"what is often seen is a lack of correlation between DNA amplification and increased 
peptide levels (Pennica et al.) As discussed by Haynes et al, polypeptide levels cannot 
be accurately predicted from mRNA levels. . .the literature cautions researches against 
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drawing conclusions based on small changes in transcript expression levels between 
normal and cancerous tissue." 

Appellants strongly disagree. Appellants submit that the Examiner applied an improper 
legal standard when making this rejection. The evidentiary standard to be used throughout ex 
parte examination of a patent application is a preponderance of the totality of the evidence under 
consideration. Thus, to overcome the presumption of truth that an assertion of utility by the 
applicant enjoys, the Examiner must establish that it is more likely than not that one of ordinary 
skill in the art would doubt the truth of the statement of utility. Only after the Examiner has 
made a proper prima facie showing of lack of utility, does the burden of rebuttal shift to the 
applicant. 

Accordingly, it is not a legal requirement to establish a necessary correlation between an 
increase in the copy number of the DNA and protein expression levels that would correlate to the 
disease state or that it is imperative to find evidence that DNA amplification is " necessarily " or 
"always" associated with overexpression of the gene product. Appellants respectfully submit 
that when the proper evidentiary standard is applied, a correlation must be acknowledged. 

First of all, the teachings of Pennica et al. are specific to WISP genes, a specific class of 
closely related molecules. Pennica et al. showed that there was good correlation between DNA 
and mRNA expression levels for the WISP-1 gene but not for WISP-2 and WISPS genes. But, 
the fact that in the case of closely related molecules, there seemed to be no correlation between 
gene amplification and the level of mRNA/protein expression does not establish that it is more 
likely than not, in general, that such correlation does not exist. As discussed above, the standard 
is not absolute certainty . Pennica et al. has no teaching whatsoever about the correlation of gene 
amplification and protein expression for genes in general . Indeed, the working hypothesis 
among those skilled in the art is that, if a gene is amplified in cancer, the encoded protein is 
likely to be expressed at an elevated level. In fact, as noted even in Pennica et al, "[a]n analysis 
of WISP-l gene amplification and expression in human colon tumors showed a correlation 
between DNA amplification and over-expression ..." (Pennica et al, page 14722, left column, 
first full paragraph, emphasis added). Accordingly, Appellants respectfully submit that Pennica 
et al. teaches nothing conclusive regarding the absence of correlation between gene amplification 
and over-expression of mRNA or polypeptides in most genes, in general. 
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Further, contrary to the Examiner's reading, the Haynes et al. reference teaches that 
"there was a general trend but no strong correlation between protein [expression] and transcript 
levels" (Emphasis added). For example, in Figure 1, there is a positive correlation between 
mRNA and protein levels amongst most of the 80 yeast proteins studied. In fact, very few data 
points deviated or scattered away from the expected normal and no data points showed a 
negative correlation between mRNA and protein levels (i.e. an increase in mRNA resulted in a 
decrease in protein levels). The analysis by Haynes et al. is not relevant to the current 
application. Haynes was studying yeast cells and not human cells. Haynes et al notes that their 
analysis focused on the 80 most abundant proteins in the yeast lysate (page 1867). Haynes et al. 
states "since many important regulatory protein are present only at low abundance, these would 
not be amenable to analysis" (page 1867). Further, Haynes et al. compared the protein 
expression levels of these naturally abundant proteins to mRNA expression levels from 
published SAGE frequency tables, (page 1863) Accordingly, Haynes et al. did not compare 
mRNA expression levels and protein levels in the same yeast cells. And thus, the analysis by 
Haynes et al. is not applicable to the present application. In fact, when the proper legal standard 
is used, Haynes' teachings clearly support the Appellants' position and is all that's needed to 
meet the "more likely than not" evidentiary standard. Again, accurate prediction is not the 
standard . 

The Examiner further cited Hu et al, to show that "by the current literature. . .one skilled 
in the art would not assume that a small increase in gene copy number would correlate with 
significantly increased mRNA or polypeptide levels" (Page 5 of the Final Office action mailed 
October 18, 2004). 

First of all, as discussed above, the increase in DNA copy number for the PRO 1097 gene 
is significant. Further, Appellants respectfully submit that, contrary to the Examiner's assertion, 
the cited Hu et al. reference does not conclusively establish a prima facie case for lack of utility 
for the PRO 1097 molecule. The Hu et al. reference is entitled "Analysis of Genomic and 
Proteomic Data using Advanced Literature Mining" (emphasis added). Therefore, as the title 
itself suggests, the conclusions in this reference are based upon statistical analysis of information 
obtained from published literature, and not from experimental data. Hu et al. performed 
statistical analysis to provide evidence for a relationship between mRNA expression and 
biological function of a given molecule (as in disease). The conclusions of Hu et al. however, 
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only apply to a specific type of breast tumor (estrogen receptor (ER)-positive breast tumor) and 
cannot be generalized to breast cancer genes in general, let alone to cancer genes in general. 
Interestingly, the observed correlation was only found among ER-positive (breast) tumors not 
ER-negative tumors." (See page 412, left column). 

Moreover, the analytical methods utilized by Hu et al. have certain statistical drawbacks, 
as the authors themselves admit. For instance, according to Hu et al., "different statistical 
methods" were applied to "estimate the strength of gene-disease relationships and evaluated the 
results." (See page 406, left column, emphasis added). Using these different statistical methods, 
Hu et al. "[a]ssessed the relative strengths of gene-disease relationships based on the frequency 
of both co-citation and single citation." (See page 411, left column). As is well known in the 
art, different statistical methods allow different variables to be manipulated to affect the resulting 
outcome. In this regard, the authors disclose that, "Initial attempts to search the literature " using 
the list of genes, gene names, gene symbols, and frequently used synonyms generated by the 
authors "revealed several sources of false positives and false negatives." (See page 406, right 
column). The authors add that the false positives caused by "duplicative and unrelated meanings 
for the term" were "difficult to manage." Therefore, in order to minimize such false positives, 
Hu et al. disclose that these terms "had to be eliminated entirely, thereby reducing the false 
positive rate but unavoidably under-representing some genes." Id. (emphasis added). Hence, Hu 
et al. had to manipulate certain aspects of the input data, in order to generate, in their opinion, 
meaningful results. Further, because the frequency of citation for a given molecule and its 
relationship to disease only reflects the current research interest of a molecule, and not the true 
biological function of the molecule, as the authors themselves acknowledge, the "[relationship 
established by frequency of co-citation do not necessarily represent a true biological link." (See 
page 41 1, right column). Therefore, based on these findings, the authors add, "[t]his may reflect 
a bias in the literature to study the more prevalent type of tumor in the population. Furthermore, 
this emphasizes that caution must be taken when interpreting experiments that may contain 
subpopulations that behave very differently." Id. (Emphasis added). In other words, some 
molecules may have been underrepresented merely because they were less frequently cited or 
studied in literature compared to other more well-cited or studied genes. Therefore, Hu et al.'s 
conclusions are not based on genes/mRNA in general. 
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Therefore, Appellants submit that, based on the nature of the statistical analysis 
performed herein, and in particular, based on Hu's analysis of one class of genes, namely, the 
estrogen receptor (ER)-positive breast tumor genes, the conclusions drawn by the Examiner, 
namely that, "genes displaying a 5 -fold change or less (mRNA expression) in tumors compared 
to normal showed no evidence of a correlation between altered gene expression and a known role 
in the disease (in general)" is not reliably supported. 

Therefore, when the proper legal standard is used, a prima facie case of lack of utility has 
not been met based on the cited references Pennica et ai, Haynes et al. or Hu et al. by the 
Examiner. 

On the contrary, Appellants submit that Example 170 in the specification further 
discloses that, "(amplification is associated with overexpression of the gene product, indicating 
that the polypeptides are useful targets for therapeutic intervention in certain cancers such as 
colon, lung, breast and other cancers and diagnostic determination of the presence of those 
cancers" (emphasis added). Besides, Appellants have submitted ample evidence (discussed 
below) to show that, in general, if a gene is amplified in cancer, it is "more likely than not" likely 
that the encoded protein will also be expressed at an elevated level. 

For support, Appellants presented the articles by Orntoft et al., Hyman et al., and Pollack 
et al. (made of record in Appellants' Response filed August 3, 2004), who collectively teach that 
in general for most genes, DNA amplification increases mRNA expression . The results 
presented by Orntoft et al, Hyman et al, and Pollack et al. are based upon wide ranging 
analyses of a large number of tumor associated genes. Orntoft et al. studied transcript levels of 
5600 genes in malignant bladder cancers, many of which were linked to the gain or loss of 
chromosomal material, and found that in general (18 of 23 cases) chromosomal areas with more 
than 2-fold gain of DNA showed a corresponding increase in mRNA transcripts. Hyman et al. 
compared DNA copy numbers and mRNA expression of over 12,000 genes in breast cancer 
tumors and cell lines, and found that there was evidence of a prominent global influence of copy 
number changes on gene expression levels. In Pollack et al, the authors profiled DNA copy 
number alteration across 6,691 mapped human genes in 44 predominantly advanced primary 
breast tumors and 10 breast cancer cell lines, and found that on average, a 2-fold change in DNA 
copy number was associated with a corresponding 1 .5-fold change in mRNA levels. In summary, 



19 



the evidence supports the Appellants' position that gene amplification is more likely than not 
predictive of increased mRNA and polypeptide levels. 

Second, the Declaration of Dr. Paul Polakis (made of record in Appellants' Response 
filed August 3, 2004), principal investigator of the Tumor Antigen Project of Genentech, Inc., 
the assignee of the present application, explains that in the course of Dr. Polakis' research using 
microarray analysis, he and his co-workers identified approximately 200 gene transcripts that are 
present in human tumor cells at significantly higher levels than in corresponding normal human 
cells. Appellants submit that Dr. Polakis' Declaration was presented to support the position that 
there is a correlation between mRNA levels and polypeptide levels, the correlation between gene 
amplification and mRNA levels having already been established by the data shown in the Orntoft 
et al, Hyman et al, and Pollack et al. articles. Appellants further emphasize that the opinions 
expressed in the Polakis Declaration, including in the above quoted statement, are all based on 
factual findings. For instance, antibodies binding to about 30 of these tumor antigens were 
prepared, and mRNA and protein levels were compared. In approximately 80% of the cases , the 
researchers found that increases in the level of a particular mRNA correlated with changes in the 
level of protein expressed from that mRNA when human tumor cells are compared with their 
corresponding normal cells . Therefore, Dr. Polakis' research, which is referenced in his 
Declaration, shows that, in general, there is a correlation between increased mRNA and 
polypeptide levels . 

Appellants further note that the sale of gene expression chips to measure mRNA levels is 
a highly successful business, with a company such as Affymetrix recording 168.3 million dollars 
in sales of their GeneChip® arrays in 2004. Clearly, the resear ch community believe that the 
information obtained from these chips is useful (i.e., that it is more likely than not that the results 
are informative of protein levels). 

Taken together, all of the submitted evidence supports the Appellants' position that, in 
the majority of amplified genes , increased gene amplification levels, more likely than not, predict 
increased mRNA and polypeptide levels, which clearly meets the utility standards described 
above. Hence, one of skill in the art would reasonably expect that, based on the gene 
amplification data of the PRO 1097 gene, the PRO 1097 polypeptide is concomitantly 
overexpressed in the lung or colon tumors studied as well. 
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Appellants further submit that, even if there were no correlation between gene 
amplification and increased mRNA/protein expression, (which Appellants expressly do not 
concede), a polypeptide encoded by an amplified gene in cancer would still have a specific, 
substantial, and credible utility as explained below. As the Declaration of Dr. Avi Ashkenazi 
(submitted with Appellants' Response filed August 3, 2004) explains: 

"even when amplification of a cancer marker gene does not result in significant over- 
expression of the corresponding gene product, this very absence of gene product over- 
expression still provides significant information for cancer diagnosis and treatment." 

Thus, even if over-expression of the gene product does not parallel gene amplification in 
certain tumor types, parallel monitoring of gene amplification and gene product over-expression 
enables more accurate tumor classification and hence better determination of suitable therapy. In 
addition, absence of over-expression is crucial information for the practicing clinician. If a gene 
is amplified in a tumor, but the corresponding gene product is not over-expressed, the clinician 
will decide not to treat a patient with agents that target that gene product. This not only saves 
money, but also has the benefit that the patient can avoid exposure to the side effects associated 
with such agents. 

This utility is further supported by the teachings of the article by Hanna and Mornin. 
(Pathology Associates Medical Laboratories, August (1999), submitted with the Response filed 
August 3, 2004). The article teaches that the HER-2/neu gene has been shown to be amplified 
and/or over-expressed in 10%-30% of invasive breast cancers and in 40%-60% of intraductal 
breast carcinomas. Further, the article teaches that diagnosis of breast cancer includes testing 
both the amplification of the HER-2/neu gene (by FISH) as well as the over-expression of the 
HER-2/neu gene product (by IHC). Even when the protein is not over-expressed, the assay 
relying on both tests leads to a more accurate classification of the cancer and a more effective 
treatment of it. 

The Examiner asserts that, 

"Hanna et al. supports the instant rejection, in that Hanna et al. show that gene 
amplification does not reliably correlate with polypeptide overexpression, and thus the 
level of polypeptide expression must be tested empirically." (Page 8 of the Final Office 
Action mailed October 18, 2004). 
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Appellants respectfully point out that the Examiner appears to have misread Hanna et al. Hanna 
et al. clearly state that gene amplification (as measured by FISH) and polypeptide expression (as 
measured by immunohistochemistry, IHC) are well correlated ("in general, FISH and IHC results 
correlate well" (Hanna et al. p. 1, col. 2)). It is only a subset of tumors which show discordant 
results. Thus, Hanna et al. support Appellants' position rather well that it is more likely than not 
that gene amplification correlates with increased polypeptide expression. The Examiner appears 
to view such testing described in the Ashkenazi Declaration and the Hanna paper as experiments, 
involving further characterization of the PRO 1097 polypeptide itself. On the contrary, such 
testing is for the purpose of characterizing not the PRO 1097 polypeptide, but the tumors in 
which the gene encoding PRO 1097 is amplified. That is, such further testing or research is for 
the purpose of characterizing the tumors into medically relevant categories in which the gene 
encoding PRO 1097 is/is not amplified, and such techniques were routine in the art of clinical 
oncology at the time of filing of the instant application, as evidenced by the teaching of Hanna 
and Mornin. 

Thus, based on the asserted utility for PRO 1097 in the diagnosis of selected lung or colon 
tumors, the reduction to practice of the instantly claimed protein sequence of SEQ ID NO: 349 in 
the present application (also see page 305), the disclosure of the step-by-step protocols for 
making chimeric PRO polypeptides, including those wherein the heterologous polypeptide is an 
epitope tag or an Fc region of an immunoglobulin in the specification (at page 374, lines 24 to 
page 375, line 9), the disclosure of a step-by-step protocol for making and expressing PRO 1097 
in appropriate host cells (in Examples 140-143 and page 376, line 12), the step-by-step protocol 
for the preparation, isolation and detection of monoclonal, polyclonal and other types of 
antibodies against the PRO 1097 protein in the specification (at pages 390-395) and the 
disclosure of the gene amplification assay in Example 170, the skilled artisan would know 
exactly how to make and use the claimed polypeptide for the diagnosis of lung or colon cancers. 
Appellants submit that based on the detailed information presented in the specification and the 
advanced state of the art in oncology, the skilled artisan would have found such testing routine 
and not 'undue'. 

Contrary to the Appellants assertion of utility, however, the Examiner alleges that the 
gene amplification results presented in Example 170 does not render the presently claimed 
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polypeptides patentably useful, and, finds the declaratory evidence presented in this case, for 
what Appellants consider legally inappropriate reasons, "non-persuasive". 

Regarding the non-acceptance of the Polakis and Ashkenazi declarations by the 
Examiner, Appellants respectfully draw the Examiner's attention to case law that clearly 
establishes that in considering affidavit evidence, the Examiner must consider all of the evidence 
of record anew (In re Rinehart, 53 1 F.2d 1084, 189 USPQ 143 (C.C.P.A. 1976) and In re 
Piasecki, 745 F.2d. 1015, 226 USPQ 881 (Fed. Cir. 1985)). "After evidence or argument is 
submitted by the applicant in response, patentability is determined on the totality of the record, 
by a preponderance of the evidence with due consideration to persuasiveness of argument" {In re 
Alton, 37 USPQ2d 1578 (Fed. Cir 1966) at 1584 quoting In re Oetiker, 977 F.2d 1443, 1445, 24 
USPQ2d 1443, 1444 (Fed. Cir. 1992)). Furthermore, the Federal Court of Appeals held in/« re 
Alton, "We are aware of no reason why opinion evidence relating to a fact issue should not be 
considered by an examiner" (In re Alton, supra.). Appellants further draw the Examiner's 
attention to the Utility Examination Guidelines (Part IIB, 66 Fed. Reg. 1098 (2001)) which 
states, 

"Office personnel must accept an opinion from a qualified expert that is based upon 
relevant facts whose accuracy is not being questioned; it is improper to disregard the 
opinion solely because of a disagreement over the significance or meaning of the facts 
offered." 

The statement in question from the Polakis Declaration that "it is my considered scientific 
opinion that for human genes, an increased level of mRNA in a tumor cell relative to a normal 
cell typically correlates to a similar increase in abundance of the encoded protein in the tumor 
cell relative to the normal cell" is based on his own experimental findings, which is clearly set 
forth in the Declaration. Further, the teachings of Ashkenazi were supported by the Her-2/neu 
gene example in Hanna and Mornin. Accordingly, the fact-based conclusions of Dr. Polakis and 
Dr. Ashkenazi would be considered reasonable and accurate by one skilled in the art. Thus, 
barring evidence to the contrary, Appellants maintain that the fold amplification disclosed for the 
PRO 1097 gene is significant and forms the basis for the utility claimed for the PRO 1097 
polypeptide herein. 

Therefore, since the instantly claimed invention is supported by either a credible, specific 
and substantial asserted utility or a well-established utility, and since the present specification 
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clearly teaches one skilled in the art "how to make and use" the claimed invention without undue 
experimentation, Appellants respectfully request reconsideration and reversal of this outstanding 
rejections under 35 U.S.C. §101 and §112, First Paragraph to Claims 119-126 and 129-131. 

ISSUE 3: Claims 119-126 satisfy the written description requirement of 35 U.S.C. §112, 
First Paragraph 

Claims 1 19-126 stand rejected under 35 U.S.C. §1 12, first paragraph as allegedly 
containing "subject matter which was not described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the inventor(s), at the time the 
application was filed, had possession of the claimed invention." In particular, the Examiner has 
asserted that "Applicants have not described or shown possession of all polypeptides 80-99% 
homologous to SEQ ID NO:349, that still retain the function of SEQ ID NO: 349. Nor have 
Applicants described a representative number of species that have 80-99% homology to SEQ ID 
NO: 349, such that it is clear that they were in possession of a genus of polypeptides functionally 
similar to SEQ ID NO: 349" (Page 9 of the Final Office Action mailed October 20, 2004). 

Appellants respectfully disagree. 

A. The Legal Test for Written Description 

The well-established test for sufficiency of support under the written description 
requirement of 35 U.S.C. §1 12, first paragraph is "whether the disclosure of the application as 
originally filed reasonably conveys to the artisan that the inventor had possession at that time of 
the later claimed subject matter, rather than the presence or absence of literal support in the 
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specification for the claim language." ' The adequacy of written description support is a 
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factual issue and is to be determined on a case-by-case basis. The factual determination in a 



In reKaslow, 707 F.2d 1366, 1374, 212 USPQ 1089, 1096 (Fed. Cir. 1983). 
19 See also Vas-Cath, Inc. v. Mahurkar, 935 F.2d at 1563, 19 USPQ2d at 1116 (Fed. Cir. 

1991). 
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written description analysis depends on the nature of the invention and the amount of knowledge 
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imparted to those skilled in the art by the disclosure. ' 
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In Environmental Designs, Ltd. v. Union Oil Co., , the Federal Circuit held, "Factors 
that may be considered in determining level of ordinary skill in the art include (1) the educational 
level of the inventor; (2) type of problems encountered in the art; (3) prior art solutions to those 
problems; (4) rapidity with which innovations are made; (5) sophistication of the technology; 
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and (6) educational level of active workers in the field." Further, the "hypothetical 'person 
having ordinary skill in the art' to which the claimed subject matter pertains would, of necessity 
have the capability of understanding the scientific and engineering principles applicable to the 
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pertinent art ." ' 

B. The Disclosure Provides Sufficient Written Description for the Claimed 
Invention 

Appellants respectfully submit that the instant specification evidences the actual 
reduction to practice of the amino acid sequence of SEQ ID NO: 349. Appellants also submit 
that the specification provides ample written support for determining percent sequence identity 
between two amino acid sequences (See pages 306-308, line 14 onwards). In fact, the 
specification teaches specific parameters to be associated with the term "percent identity" as 
applied to the present invention. The specification further provides detailed guidance as to 
changes that may be made to a PRO polypeptide without adversely affecting its activity (page 

Union Oil v. Atlantic Richfield Co., 208 F.2d 989, 996 (Fed. Cir. 2000). 
See also M.P.E.P. §2163 11(A). 

713 F.2d 693, 696, 218 USPQ 865, 868 (Fed. Cir. 1983), cert, denied, 464 U.S. 1043 
See also M.P.E.P. §2141.03. 

Ex parte Hiyamizu, 10 USPQ2d 1393, 1394 (Bd. Pat. App. & Inter. 1988) (emphasis 
See also M.P.E.P. §2141.03. 
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372, line 36 to page 373, line 17). This guidance includes a listing of exemplary and preferred 
substitutions for each of the twenty naturally occurring amino acids (Table 6, page 372). 
Accordingly, one of skill in the art could identify whether the variant PRO 1097 sequence falls 
within the parameters of the claimed invention. Once such an amino acid sequence was 
identified, the specification sets forth methods for making the amino acid sequences (see page 
376, line 9) and methods of preparing the PRO polypeptides (see Example 140-143). 

Currently pending Claims 1 19-126 recite the functional limitation that the nucleic acid 
encoding the claimed polypeptides are amplified in lung or colon tumors. Appellants further 
submit that the specification provides ample written support for detecting and quantifying 
amplification of such nucleic acids in several tumors and/or cell lines as described in Example 
170. Example 170 of the present application provides step-by-step guidelines and protocols for 
the gene amplification assay. By following this disclosure, one skilled in the art would know 
that it is easy to test whether a gene encoding a variant PRO 1097 protein is amplified in lung or 
colon tumors by the methods set forth in Example 170. 

Thus, the genus of polypeptides with at least 80% sequence identity to SEQ ID NO: 349, 
which possess the functional property of having a nucleic acid which is amplified in lung or 
colon tumor would meet the requirement of 35 U.S.C. §112, first paragraph, as providing 
adequate written description. Accordingly, one skilled in the art would have known that 
Appellants had knowledge and possessed the claimed polypeptides with 80-99% sequence 
identity to SEQ ED NO: 349 whose encoding nucleic acids were amplified in lung or colon 
tumors. The recited property of amplification of the encoding gene adds to the characterization 
of the claimed polypeptide sequences in a manner that one of skill in the art could readily assess 
and understand. 

As discussed above, Appellants have recited structural features, namely, 80% sequence 
identity to SEQ ID NO: 349, which are common to the genus. Appellants have also provided 
guidance as to how to make the recited variants of SEQ ID NO: 349, including listings of 
exemplary and preferred sequence substitutions. The genus of claimed polypeptides is further 
defined by having a specific functional activity for the encoding nucleic acids. Accordingly, a 
description of the claimed genus has been achieved. 

For the above reasons, the specification provides adequate written description for 
polypeptides having at least 80% identity to SEQ ID NO: 349 wherein the nucleic acid encoding 
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the polypeptide is amplified in lung or colon tumor. Accordingly, Appellants respectfully 
request reconsideration and reversal of the written description rejection of Claims 1 19-126 under 
35 U.S.C. §112, first paragraph. 



CONCLUSION 



For the reasons given above, Appellants submit that present specification clearly 
describes, details and provides a patentable utility for the claimed invention. Moreover, it is 
respectfully submitted that based upon this disclosed patentable utility, the present specification 
clearly teaches "how to use" the presently claimed polypeptide. As such, Appellants respectfully 
request reconsideration and reversal of the outstanding rejection of claims 1 19-126 and 129-131. 



HELLER EHRMAN LLP 

275 Middlefield Road 
Menlo Park, California 94025-3506 
Telephone: (650) 324-7000 
Facsimile: (650) 324-0638 



Respectfully submitted, 



Date: October 28, 2005 




27 



IX. CLAIMS APPENDIX 
Claims on Appeal 

119. An isolated polypeptide having at least 80% amino acid sequence identity to: 

(a) the amino acid sequence of the polypeptide of SEQ ID NO:349; 

(b) the amino acid sequence of the polypeptide of SEQ ID NO:349, lacking its 
associated signal peptide; or 

(c) the amino acid sequence of the polypeptide encoded by the full-length coding 
sequence of the cDNA deposited under ATCC accession number 203044; 

wherein, the nucleic acid encoding said polypeptide is amplified in lung or colon cancer. 

120. An isolated polypeptide having at least 85% amino acid sequence identity to: 

(a) the amino acid sequence of the polypeptide of SEQ ID NO:349; 

(b) the amino acid sequence of the polypeptide of SEQ ID NO:349, lacking its 
associated signal peptide; or 

(c) the amino acid sequence of the polypeptide encoded by the full-length coding 
sequence of the cDNA deposited under ATCC accession number 203044; 

wherein, the nucleic acid encoding said polypeptide is amplified in lung or colon cancer. 

121. An isolated polypeptide having at least 90% amino acid sequence identity to: 

(a) the amino acid sequence of the polypeptide of SEQ ID NO:349; 

(b) the amino acid sequence of the polypeptide of SEQ ID NO:349, lacking its 
associated signal peptide; or 

(c) the amino acid sequence of the polypeptide encoded by the full-length coding 
sequence of the cDNA deposited under ATCC accession number 203044; 

wherein, the nucleic acid encoding said polypeptide is amplified in lung or colon cancer. 

122. An isolated polypeptide having at least 95% amino acid sequence identity to: 

(a) the amino acid sequence of the polypeptide of SEQ ID NO:349; 

(b) the amino acid sequence of the polypeptide of SEQ ID NO:349, lacking its 
associated signal peptide; or 
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(c) the amino acid sequence of the polypeptide encoded by the full-length coding 
sequence of the cDNA deposited under ATCC accession number 203044; 

wherein, the nucleic acid encoding said polypeptide is amplified in lung or colon cancer. 

123. An isolated polypeptide having at least 99% amino acid sequence identity to : 

(a) the amino acid sequence of the polypeptide of SEQ ID NO:349; 

(b) the amino acid sequence of the polypeptide of SEQ ID NO:349, lacking its 
associated signal peptide; or 

(c) the amino acid sequence of the polypeptide encoded by the full-length coding 
sequence of the cDNA deposited under ATCC accession number 203044; 

wherein, the nucleic acid encoding said polypeptide is amplified in lung or colon cancer. 

1 24. An isolated polypeptide comprising: 

(a) the amino acid sequence of the polypeptide of SEQ ID NO: 349; 

(b) the amino acid sequence of the polypeptide of SEQ ID NO: 349, lacking its 
associated signal peptide; 

(c) the amino acid sequence of the polypeptide encoded by the full-length coding 
sequence of the cDNA deposited under ATCC accession number 203044; 

wherein, the nucleic acid encoding said polypeptide is amplified in lung or colon cancer. 

125. The isolated polypeptide of Claim 1 24 comprising the amino acid sequence of the 
polypeptide of SEQ ID NO: 349. 



1 26. The isolated polypeptide of Claim 1 24 comprising the amino acid sequence of the 
polypeptide of SEQ ID NO: 349, lacking its associated signal peptide. 

129. The isolated polypeptide of Claim 124 comprising the amino acid sequence of the 
polypeptide encoded by the full-length coding sequence of the cDNA deposited under 
ATCC accession number 203044. 

130. A chimeric polypeptide comprising a polypeptide according to Claim 124 fused to a 
heterologous polypeptide. 
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131. The chimeric polypeptide of Claim 130, wherein said heterologous polypeptide is an 
epitope tag or an Fc region of an immunoglobulin. 
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X. EVIDENCE APPENDIX 

1. Declaration of Paul Polakis, Ph.D. under 35 C.F.R. 1.132. 

2. Declaration of Avi Ashkenazi, Ph.D. under 35 C.F.R. 1.132. 

3. Declaration of Audrey Goddard, Ph.D. under 35 C.F.R. 1.132. 

4. Orntoft et al, 2002, Mol. and Cell. Proteomics, Vol.1, pages 37-45. 

5. Hyman et al, Cancer Res., 2002, Vol. 62, pages 6240-45. 

6. Pollack et al, PNAS, 2002, Vol. 99, pages 12963-12968. 

7. Hanna and Mornin, 1999, Pathology Associates Medical Laboratories. 

8. Hittelman et al 2001, Ann. N.Y. Acad. Sci. 952: 1-12. 

9. Crowell et al. 1996, Cancer Epidemiol., 5: 631-37. 

10. Skolnick et al, 2000, Trends in Biotech., 18:34-39. 

11. Bork et al, 2000, Genome Res. 10: 398-400. 

12. Doerks et al, 1998, Trends in Genetics, 14: 248-250. 

13. Hesselgesser etal, 1997, Meth. inEnzymol, 287: 59-69. 

14. Blease et al, 2000, Resp. Res., 1(1): 54-61. 

15. Smith etal, 1997, Nat. Biotechnol., 15: 1222-23. 

16. Brenner et al, 1999, Trends in Genetics, 15: 132-133. 

17. Pennica et al, Proc. Nat. Acad. Sci., 1998, Vol. 95, pages 141097-722. 

18. Haynes et al, Electrophoresis, 1998, Vol. 19, pages 1862-71. 

19. Hu et al, J. Proteome Res., 2003, Vol. 2, pages 405-412. 

Items 1, 2, 4-7 were submitted with Appellants' Response filed August 3, 2004, and were 
considered by the Examiner as indicated in the Final Office action mailed October 18, 2004. 

Item 3 is hereby submitted with the Appellants' brief. As indicated above, this declaration was 
not presented earlier because the issue whether the "fold increase" in the gene amplification 
assay was "significant" was not raised earlier. However, Appellants believe that presentation of 
the Goddard Declaration as evidence that the "increase in DNA" in the gene amplification assay 
is significant is necessary in this case and presents the case in better form for appeal. Its 
consideration is respectfully requested. 
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Items 8-16 were made of record by the Examiner in the Office Action mailed May 3, 2004. 

Items 17-19 were made of record by the Examiner in the Final Office Action mailed October 18, 
2004. 
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XI. RELATED PROCEEDINGS APPENDIX 

None. 
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DECLARATION OF PAUL POLAKIS, Ph.D. 
Polakis, Ph.D., declare and say as follows: 

1 . I was awarded a Ph.D. by the Department of Biochemistry of the Michigan 
State University in 1984. My scientific Curriculum Vitae is attached to and forms 
part of this Declaration (Exhibit A). 

2. I am currently employed by Genentech, Inc. where my job title is Staff 
Scientist. Since joining Genentech in 1999, one of my primary responsibilities has 
been leading Genentech's Tumor Antigen Project, which is a large research project 
with a primary focus on identifying tumor cell markers that find use as targets for 
both the diagnosis and treatment of cancer in humans. 

3. As part of the Tumor Antigen Project, my laboratory has been analyzing 
differential expression of various genes in tumor cells relative to normal cells. 
The purpose of this research is to identify proteins that are abundantly expressed 
on certain tumor cells and that are either (i) not expressed, or (ii) expressed at 
lower levels, on corresponding normal cells. We call such differentially expressed 
proteins "tumor antigen proteins'*. When such a tumor antigen protein is 
identified, one can produce an antibody that recognizes and binds to that protein. 
Such an antibody finds use in the diagnosis of human cancer and may ultimately 
serve as an effective therapeutic in the treatment of human cancer. 

4. In the course of the research conducted by Genentech's Tumor Antigen 
Project, we have employed a variety of scientific techniques for detecting and 
studying differential gene expression in human tumor cells relative to normal cells, 
at genomic DNA, mRNA and protein levels. An important example of one such 
technique is the well known and widely used technique of microarray analysis 
which has proven to be extremely useful for the identification of mRNA molecules 
that are differentially expressed in one tissue or cell type relative to another. In the 
course of our research using microarray analysis, we have identified 
approximately 200 gene transcripts that are present in human tumor cells at 
significantly higher levels than in corresponding normal human cells. To date, we 
have generated antibodies that bind to about 30 of the tumor antigen proteins 
expressed from these differentially expressed gene transcripts and have used these 
antibodies to quantitatively determine the level of production of these tumor 
antigen proteins in both human cancer cells and corresponding normal cells. We 
have then compared the levels of mRNA and protein in both the tumor and normal 
cells analyzed. 

5. From the mRNA and protein expression analyses described in paragraph 4 
above, we have observed that there is a strong correlation between changes in the 
level of mRNA present in any particular cell type and the level of protein 




expressed from that mRNA in that cell type. In approximately 80% of our 
observations we have found that increases in the level of a particular mRNA 
correlates with changes in the level of protein expressed from that mRNA when 
human tumor cells are compared with their corresponding normal cells. 

6. Based upon my own experience accumulated in more than 20 years of 
research, including the data discussed in paragraphs 4 and 5 above and my 
knowledge of the relevant scientific literature, it is my considered scientific 
opinion that for human genes, an increased level of mRNA in a tumor cell relative 
to a normal cell typically correlates to a similar increase in abundance of the 
encoded protein in the tumor cell relative to the normal cell. In fact, it remains a 
central dogma in molecular biology that increased mRNA levels are predictive of 
corresponding increased levels of the encoded protein. While there have been 
published reports of genes for which such a correlation does not exist, it is my 
opinion that such reports are exceptions to the commonly understood general rule 
that increased mRNA levels are predictive of corresponding increased levels of the 
encoded protein. 

7. I hereby declare that all statements made herein of my own knowledge are 
true and that all statements made on information or belief are believed to be true, 
and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 1 8 of the United States Code and that such willful 
statements may jeopardize the validity of the application or any patent issued 
thereon. 



Dated: 5/£Vo£_ 




PaulPolakis,Ph.D. 
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65. Kischkel, F.C., Lawrence, D. A., Tinel, A., Virmani, A., Schow, P., Gazdar, A., 
Blenis, J., Arnott, D., and Ashkenazi, A . Death receptor recruitment of 
endogenous caspase-10 and apoptosis initiation in the absence of caspase-8. J. 
Biol. Chem. 276, 46639-46646(2001). 

66. LeBlanc, H., Lawrence, D.A., Varfolomeev, E., Totpal, K., Morlan, J., Schow, P., 
Fong, S., Schwall, R., Sinicropi, D., and Ashkenazi, A T umor cell resistance to 
death receptor induced apoptosis through mutational inactivation of the 
proapoptotitc Bcl-2 homolog Bax. Nature Med. 8, 274-28 1 (2002). 

67. Miller, K., Meng, G., Liu, J., Hurst, A., Hsei, V., Wong, W-L, Ekert, R., 
Lawrence, D., Sherwood, S., DeForge, L., Gaudreault., Keller, G., Sliwkowski, 
M„ Ashkenazi, A ., and Presta, L. Design, Construction, and analyses of 
multivalent antibodies. /. Immunol. 170, 4854-4861 (2003). 
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68. Varfolomeev, E., Kischkel, F., Martin, F., Wanh, H., Lawrence, D., Olsson, C, 
Tom, L., Erickson, S., French, D., Schow, P., Grewal, I. and Ashkenazi, A. 
Immune system development in APRIL knockout mice. Submitted. 

Review articles: 

1 . Ashkenazi. A., Peralta, E., Winslow, J., Ramachandran, J., and Capon, D., J. 
Functional role of muscarinic acetylcholine receptor subtype diversity. Cold 
Spring Harbor Symposium on Quantitative Biology. LIII, 263-272 (1988). 

2. Ashkenazi, A ., Peralta, E., Winslow, J., Ramachandran, J., and Capon, D. 
Functional diversity of muscarinic receptor subtypes in cellular signal 
transduction and growth. Trends Pharmacol. ScL Dec Supplement, 12-21 (1989). 

3. Chamow, S., Duliege, A., Ammann, A., Kahn, J., Allen, D., Eichberg, J ., Byrn, 
R., Capon, D., Ward, R., and Ashkenazi. A . CD4 immunoadhesins in anti-HTV 
therapy: new developments. Int. J. Cancer Supplement 7, 69-72 (1992). 

4. Ashkenazi, A ., Capon, and D. Ward, R. Immunoadhesins. Int. Rev. Immunol. 10, 
217-225(1993). 

5. Ashkenazi, A ., and Peralta, E. Muscarinic Receptors. In Handbook of Receptors 
and Channels. (S. Peroutka, ed.), CRC Press, Boca Raton, Vol. I, p. 1-27, (1994). 

6. Krantz, S. B., Means, R. T., Jr., Lina, J., Marsters, S. A., and Ashkenazi, A. 
Inhibition of erythroid colony formation in vitro by gamma interferon. In 
Molecular Biology of Hematopoiesis (N. Abraham, R. Shadduck, A. Levine F. 
Takaku, eds.) Intercept Ltd. Paris, Vol. 3, p. 135-147 (1994). 

7. Ashkenazi, A . Cytokine neutralization as a potential therapeutic approach for 
SIRS and shock. J. Biotechnology in Healthcare 1, 197-206 (1994). 

8. Ashkenazi, A ., and Chamow, S. M. Immunoadhesins: an alternative to human 
monoclonal antibodies. Ifnmunomethods: A companion to Methods in 
Enzimology 8, 104-115 (1995). 

9. Chamow, S., and Ashkenazi, A . Immunoadhesins: Principles and Applications. 
Trends Biotech. 14, 52-60 (1996). 

10. Ashkenazi, A ., and Chamow, S. M. Immunoadhesins as research tools and 
therapeutic agents. Curr. Opin. Immunol. 9, 195-200 (1997). 

11. Ashkenazi, A ., and Dixit, V. Death receptors: signaling and modulation. Science 
281, 1305-1308(1998). 

12. Ashkenazi, A ., and Dixit, V. Apoptosis control by death and decoy receptors. 
Curr. Opin. Cell. Biol. 11,255-260(1999). 
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13. Ashkenazi, A . Chapters on Apo2L/TRAJL; DR4, DR5, DcRl, DcR2; and DcR3. 
Online Cytokine Handbook fwww.apnet.com/cytokinereference/) . 

14. Ashkenazi, A . Targeting death and decoy receptors of the tumor necrosis factor 
superfamily. Nature Rev. Cancer 2, 420-430 (2002). 

15. LeBlanc, H. and Ashkenazi, A . Apoptosis signaling by Apo2L/TRAIL. Cell Death 
and Differentiation 10, 66-75 (2003). 

16. Almasan, A. and Ashkenazi, A . Apo2L/TRAIL: apoptosis signaling, biology, and 
potential for cancer therapy. Cytokine and Growth Factor Reviews 14, 337-348 
(2003). 

Book: 

Antibody Fusion Proteins (Chamow, S., and Ashkenazi, A ., eds., John Wiley and 
Sons Inc.) (1999). ; 

Talks: 

1 . Resistance of primary HTV isolates to CD4 is independent of CD4-gp 1 20 binding 
affinity. UCSD Symposium, HTV Disease: Pathogenesis and Therapy. 
Greenelefe, FL, March 1991. 

2. Use of immuno-hybrids to extend the half-life of receptors. IBC conference on 
Biopharmaceutical Halflife Extension. New Orleans, LA, June 1992. 

3. Results with TNF receptor Immunoadhesins for the Treatment of Sepsis. IBC 
conference on Endotoxemia and Sepsis. Philadelphia, PA, June 1992. 

4. Immunoadhesins: an alternative to human antibodies. IBC conference on 
Antibody Engineering. San Diego, C A, December 1993. 

5. Tumor necrosis factor receptor: a potential therapeutic for human septic shock. 
American Society for Microbiology Meeting, Atlanta, GA, May 1993. 

6. Protective efficiacy of TNF receptor hnmunoadhesin vs anti-TNF monoclonal 
antibody in a rat model for endotoxic shock. 5th International Congress on TNF. 
Asilomar, CA, May 1994. 

7. Interferon-y signals via a multisubunit receptor complex that contains two types of 
polypeptide chain. American Association of hnmunologists Conference. San 
Franciso, CA, July 1995. 

8. Immunoadhesins: Principles and Applications. Gordon Research Conference on 
Drug Delivery in Biology and Medicine. Ventura, CA, February 1996. 
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9. Apo-2 Ligand, a new member of the TNF family that induces apoptosis in tumor 
cells. Cambridge Symposium on TNF and Related Cytokines in Treatment of 
Cancer. Hilton-Head, NC, March 1996. 

10. Induction of apoptosis by Apo2 Ligand. American Society for Biochemistry and 
Molecular Biology, Symposium on Growth Factors and Cytokine Receptors. New 
Orleans, LA, June, 1996. 

11. *Apo2 ligand, an extracellular trigger of apoptosis. 2nd Clontech Symposium, 
Palo Alto, C A, October 1996. 

12. Regulation of apoptosis by members of the TNF ligand and receptor families. 
Stanford University School of Medicine, Palo Alto, CA, December 1996. 

13. Apo-3: anovel receptor that regulates cell death and inflammation. 4th 
International Congress on Immune Consequences of Trauma, Shock, and Sepsis. 
Munich, Germany, March 1997. 

14. New members of the TNF ligand and receptor families that regulate apoptosis, 
inflammation, and immunity. UCLA School of Medicine, LA, CA, March 1997. 

15. Immunoadhesins: an alternative to monoclonal antibodies. 5th World Conference 
on Bispecific Antibodies. Volendam, Holland, June 1997. 

16. Control of Apo2L signaling. Cold Spring Harbor Laboratory Symposium on 
Programmed Cell Death. Cold Spring Harbor, New York. September, 1 997. 

17. Chairman and speaker, Apoptosis Signaling session. IBC's 4th Annual 
Conference on Apoptosis. San Diego, CA., October 1997. 

18. Control of Apo2L signaling by death and decoy receptors. American Association 
for the Advancement of Science. Philladelphia, PA, February 1998. 

19. Apo2 ligand and its receptors. American Society of Immunologists. San 
Francisco, CA, April 1998. 

20. Death receptors and ligands. 7th International TNF Congress. Cape Cod, MA, 

May 1998. 

21. Apo2L as a potential therapeutic for cancer. UCLA School of Medicine. LA, 
. CA, June 1998. 

22. Apo2L as a potential therapeutic for cancer. Gordon Research Conference on 
Cancer Chemotherapy. New London, NH, July 1998. 

23: Control of apoptosis by Apo2L. Endocrine Society Conference, Stevenson, WA, 
August 1998. 

24. Control of apoptosis by Apo2L. International Cytokine Society Conference, 
Jerusalem, Israel, October 1998. 
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25. Apoptosis control by death and decoy receptors. American Association for 
Cancer Research Conference, Whistler, BC, Canada, March 1999. 

26. Apoptosis control by death and decoy receptors. American Society for 
Biochemistry and Molecular Biology Conference, San Francisco, CA May 1999. 

27. Apoptosis control by death and decoy receptors. Gordon Research Conference on 
Apoptosis, New London, NH, June 1999. 

28. Apoptosis control by death and decoy receptors. Arthritis Foundation Research 
Conference, Alexandria GA, Aug 1999. 

29. Safety and anti-tumor activity of recombinant soluble Apo2L/TRATL. Cold 
Spring Harbor Laboratory Symposium on Programmed Cell Death. . Cold Spring 
Harbor, NY, September 1999. 

30. The Apo2L/TRAIL system: therapeutic potential. American Association for 
Cancer Research, Lake Tahoe, NV, Feb 2000. 

31. Apoptosis and cancer therapy. Stanford University School of Medicine, Stanford, 
CA, Mar 2000. 

32 . Apoptosis and cancer therapy. University of Pennsylvania School of Medicine, 
Philadelphia, PA, Apr 2000. 

33 . Apoptosis signaling by Apo2L/TRAJL. International Congress on TNF. 
Trondheim, Norway, May 2000. 

34. The Apo2L/TRAIL system: therapeutic potential. Cap-CURE summit meeting. 
Santa Monica, CA, June 2000. 

35. The Apo2L/TRAIL system: therapeutic potential. MD Anderson Cancer Center. 
Houston, TX, June 2000. 

36. Apoptosis signaling by Apo2L/TRATL. The Protein Society, 14 th Symposium. 
San Diego, CA, August 2000. 

37. Anti-tumor activity of Apo2L/TRAJL. AAPS annual meeting. Indianapolis, IN 

Aug 2000. 

38. Apoptosis signaling and anti-cancer potential of Apo2L/TRATL. Cancer Research 
Institute, UC San Francisco, CA, September 2000. 

39 Apoptosis signaling by Apo2I7TRAIL. Kenote address, TNF family 
Minisymposium, NJH. Bethesda, MD, September 2000. 

40. Death receptors: signaling and modulation. Keystone symposium on the 
Molecular basis of cancer. Taos, NM, Jan 2001. 

41. Preclinical studies of Apo2L/TRAJL in cancer. Symposium on Targeted therapies 
in the treatment of lung cancer. Aspen, CO, Jan 2001. 
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42. Apoptosis signaling by Apo2L/TRAIL. Wiezmann Institute of Science, Rehovot, 
Israel, March 2001. 

43 . Apo2L/TRAIL: Apoptosis signaling and potential for cancer therapy. Weizmann 
Institute of Science, Rehovot, Israel, March 2001. 

44. Targeting death receptors in cancer with Apo2L/TRAIL. Cell Death and Disease 
conference, North Falmouth, MA, Jun 2001. 

45. Targeting death receptors in cancer with Apo2L/TRAIL. Biotechnology 
Organization conference, San Diego, C A, Jun 2001. 

46. Apo2L/TRAIL signaling and apoptosis resistance mechanisms. Gordon Research 
Conference on Apoptosis, Oxford, UK, July 2001. 

47. Apo2L/TRAIL signaling and apoptosis resistance mechanisms. Cleveland Clinic 
Foundation, Cleveland, OH, Oct 2001. 

48. Apoptosis signaling by death receptors: overview. International Society for 
Interferon and Cytokine Research conference, Cleveland, OH, Oct 2001 . 

49. • Apoptosis signaling by death receptors. American Society of Nephrology 

Conference. San Francisco, CA, Oct 2001. 

50. Targeting death receptors in cancer. Apoptosis: commercial opportunities. San 
Diego, CA, Apr 2002. 

51. Apo2L/TRAIL signaling and apoptosis resistance mechanisms. Kimmel Cancer 
Research Center, Johns Hopkins University, Baltimore MD. May 2002. 

52. Apoptosis control by Apo2L/TRAJJL. (Keynote Address) University of Alabama 
Cancer Center Retreat, Birmingham, Ab. October 2002. 

53. Apoptosis signaling by Apo2L/TRAJJL. (Session co-chair) TNF international, 
conference. San Diego, C A. October 2002. 

54. Apoptosis signaling by Apo2L/TRAIL. Swiss Institute for Cancer Research 
(ISREC). Lausanne, Swizerland. Jan 2003. 

5 5 . Apoptosis induction with Apo2I/TRAIL. Conference on New Targets and 
Innovative Strategies in Cancer Treatment. Monte Carlo. February 2003. 

56. Apoptosis signaling by Apo2L/TRAIL. Hermelin Brain Tumor Center 
Symposium on Apoptosis. Detroit, MI. April 2003. 

57. Targeting apoptosis through death receptors. Sixth Annual Conference on 
Targeted Therapies in the Treatment of Breast Cancer. Kona, Hawaii. July 2003. 

58. Targeting apoptosis through death receptors. Second International Conference on 
Targeted Cancer Therapy. Washington, DC. Aug 2003. 

Issued Patents: 
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1 . Ashkenazi, A., Chamow, S. and Kogan, T. Carbohydrate-directed crosslinking 
reagents. US patent 5,329,028 (Jul 12, 1994). 

2. Ashkenazi, A., Chamow, S. and Kogan, T. Carbohydrate-directed crosslinking 
reagents. US patent 5,605,791 (Feb 25, 1997). 

3. Ashkenazi, A., Chamow, S. and Kogan, T. Carbohydrate-directed crosslinking 
reagents. US patent 5,889,155 (Jul 27, 1999). 

4. Ashkenazi, A., APO-2 Ligand. US patent 6,030,945 (Feb 29, 2000). 

5. Ashkenazi, A., Chuntharapai, A., Kim, J., APO-2 ligand antibodies. US patent 6, 
046, 048 (Apr 4, 2000). . 

6. Ashkenazi, A., Chamow, S. and Kogan, T. Carbohydrate-directed crosslinking 
reagents. US patent 6,124,435 (Sep 26, 2000). 

7. Ashkenazi, A., Chuntharapai, A., Kim, J., Method for making monoclonal and cross- 
reactive antibodies. US patent 6,252,050 (Jun 26, 2001). 

8. Ashkenazi, A. APO-2 Receptor. US patent 6,342,369 (Jan 29, 2002). 

9. Ashkenazi, A. Fong, S., Goddard, A., Gurney, A., Napier, M, Tumas, D., Wood, W. 
A-33 polypeptides. US patent 6,410,708 (Jun 25, 2002). 

10. Ashkenazi, A. APO-3 Receptor. US patent 6,462,176 Bl (Oct 8, 2002). 

11. Ashkenazi, A. APO-2LI and APO-3 polypeptide antibodies. US patent 6,469, 144 B 1 
(Oct 22, 2002). 

12. Ashkenazi, A., Chamow, S. and Kogan, T. Carbohydrate-directed crosslinking 
reagents. US patent 6,582,928B1 (Jun 24, 2003). 
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THE UNITED STATES PATENT AND TRADEMARK OFFICE 



PATENT 



In re Application of: Ashkenazi et al. 



Group Art Unit: 1647 



Serial No.: 09/903,925 



Examiner: Fozia Hamid 



Filed: July 11, 2001 



1 hereby ca1ify:that this -correspondence is being depra^ with She United. ^ " 
>States Postal Ser»a«.w^ 
tadd^esscd.to^ Assistant Comn^ 



For: SECRETED AND 



TRANSMEMBRANE 
POLYPEPTIDES AND NUCLEIC 
ACIDS 




DECLARATION OF AUDREY D. GODDARD, Ph.D UNDER 37 C.F.R. $ 1.132 

Assistant Commissioner of Patents 
Washington, D.C. 20231 

Sir: 

I, Audrey D. Goddard, PhD. do hereby declare and say as follows: 

1 . I am a Senior Clinical Scientist at the Experimental Medicine/BioOncology, Medical 
Affairs Department of Genentech, Inc., South San Francisco, California 94080. 

2. Between 1993 and 2001 , 1 headed the DNA Sequencing Laboratory at the Molecular 
Biology Department of Genentech, Inc. During this time, my responsibilities included the 
identification and characterization of genes contributing to the oncogenic process, and determination 
of the chromosomal localization of novel genes. 

3 . My scientific Curriculum Vitae, including my list of publications, is attached to and 
forms part of this Declaration (Exhibit A). 
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Serial No.: * 
Filed:* 

4. I am familiar with a variety of techniques known in the art for detecting and 
quantifying the amplification of oncogenes in cancer, including the quantitative TaqMan PCR (i.e., 
"gene amplification") assay described in the above captioned patent application. 

5. The TaqMan PCR assay is described, for example, in the following scientific 
publications: Higuchi et al, Biotechnology 10:413-417 (1992) (Exhibit B); Livak et al, PCR 
Methods AppL 4:357-362 (1995) (Exhibit C) and Heid et al., Genome Res. 6:986-994 (1996) . 
(Exhibit D). Briefly, the assay is based on the principle that successful PCR yields a fluorescent 
signal due to Taq DNA polymerase-mediated exonuclease digestion of a fluorescently labeled 
oligonucleotide that is homologous to a sequence between two PCR primers. The extent of 
digestion depends directly on the amount of PCR, and can be quantified accurately by measuring the 
increment in fluorescence that results from decreased energy transfer. This is an extremely sensitive 
technique, which allows detection in the exponential phase of the PCR reaction and, as a result, 
leads to accurate determination of gene copy number. 

6. The quantitative fluorescent TaqMan PCR assay has been extensively and 
successfully used to characterize genes involved in cancer development and progression. 
Amplification of protooncogenes has been studied in a variety of human tumors, and is widely 
considered as having etiological, diagnostic and prognostic significance. This use of the quantitative 
TaqMan PCR assay is exemplified by the following scientific publications: Pennica et al., Proc. 
Natl. Acad. Sci. USA 95(25): 147 17- 14722 (1998) (Exhibit E); Pitti et al, Nature 
396(67 12):699-703 (1998) (Exhibit F) and Bieche et al., Int. J. Cancer 78:661-666 (1998) (Exhibit 
G), the first two of which I am co-author. In particular, Pennica et al. have used the quantitative 
TaqMan PCR assay to study relative gene amplification of WISP and c-myc in various cell lines, 
colorectal tumors and normal mucosa. Pitti et al. studied the genomic amplification of a decoy 
receptor for Fas ligand in lung and colon cancer, using the quantitative TaqMan PCR assay. Bieche 
et al. used the assay to study gene amplification in breast cancer. 
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Serial No.: * 
Filed: * 



7. It is my personal experience that the quantitative TaqMan PCR technique is 
technically sensitive enough to detect at least a 2-fold increase in gene copy number relative to 
control. It is further my considered scientific opinion that an at least 2-fold increase in gene copy 
number in a tumor tissue sample relative to a normal (i.e., non-tumor) sample is significant and 
useful in that the detected increase in gene copy number in the tumor sample relative to the normal 
sample serves as a basis for using relative gene copy number as quantitated by the TaqMan PCR 
technique as a diagnostic marker for the presence or absence of tumor in a tissue sample of unknown • 
pathology. Accordingly, a gene identified as being amplified at least 2-fold by the quantitative 
TaqMan PCR assay in a tumor sample relative to a normal sample is useful as a marker for the 
diagnosis of cancer, for monitoring cancer development and/or for measuring the efficacy of cancer 
therapy. 

8. i* declare further that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true. I declare that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code, and that such willful false statements may jeopardize the validity of the application or any 
patent issuing thereon. 



Date 



Audrey D. Goddard, PhD. 




1 



AUDREY D. GODDARD 



Ph.D. 



Genentech, Inc. 
1 DNA Way 

South San Francisco, CA, 94080 

650.225.6429 

goddarda@gene.com 



110 Congo St. 

San Francisco, CA, 94131 

415.841.9154 

415.819.2247 (mobile) 

agoddard@pacbell.net 



PROFESSIONAL EXPERIENCE 



Genentech, Inc. 



1993-present 



South San Francisco, CA 

2001 - present Senior Clinical Scientist 

Experimental Medicine / BioOncology, Medical Affairs 

Responsibilities: 

• Companion diagnostic oncology products 

• Acquisition of clinical samples from Genentech's clinical trials for translational research 

• Translational research using clinical specimen and data for drug development and 
diagnostics 

• Member of Development Science Review Committee, Diagnostic Oversight Team, 21 CFR 
Part 11 Subteam 

Interests: 

• Ethical and legal implications of experiments with clinical specimens and data 

• Application of pharmacogenomics in clinical trials 



1998 - 2001 Senior Scientist 

Head of the DNA Sequencing Laboratory, Molecular Biology Department, Research 
Responsibilities: 

• Management of a laboratory of up to nineteen -including postdoctoral fellow, associate 
scientist, senior research associate and research assistants/associate levels 

• Management of a $750K budget 

• DNA sequencing core facility supporting a 350+ person research facility. 

• DNA sequencing for high throughput gene discovery, - ESTs, cDNAs, and constructs 

• Genomic sequence analysis and gene identification 

• DNA sequence and primary protein analysis 

Research: 

• Chromosomal localization of novel genes 

• Identification and characterization of genes contributing to the oncogenic process 

• Identification and characterization of genes contributing to inflammatory diseases 

• Design and development of schemes for high throughput genomic DNA sequence analysis 

• Candidate gene prediction and evaluation 
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1993-1998 



Scientist 



Head of the DNA Sequencing Laboratory, Molecular Biology Department, Research 
Responsibilities 

• DNA sequencing core facility supporting a 350+ person research facility 

• Assumed responsibility for a pre-existing team of five technicians and expanded the group 
into fifteen, introducing a level of middle management and additional areas of research 

• Participated in the development of the basic plan for high throughput secreted protein 
discovery program - sequencing strategies, data analysis and tracking, database design 

• High throughput EST and cDNA sequencing for new gene identification. 

• Design and implementation of analysis tools required for high throughput gene identification. 

• Chromosomal localization of genes encoding novel secreted proteins. 

Research: 

• Genomic sequence scanning for new gene discovery. 

• Development of signal peptide selection methods. 

• Evaluation of candidate disease genes. 

• Growth hormone receptor gene SNPs in children with Idiopathic short stature 

Imperial Cancer Research Fund 1989-1992 
London, UK with Dr. Ellen Solomon 

6/89-12/92 Postdoctoral Fellow 

• Cloning and characterization of the genes fused at the acute promyelocytic leukemia 
translocation breakpoints on chromosomes 17 and 15. 

• Prepared a successfully funded European Union multi-center grant application 

McMaster University 1983 
Hamilton, Ontario, Canada with Dr. G. D. Sweeney 

5/83 - 8/83: NSERC Summer Student 

• In vitro metabolism of p-naphthoflavone in C57BI/6J and DBA mice 



EDUCATION 



Ph.D. 



University of Toronto 
Toronto, Ontario, Canada. 
Department of Medical 
Biophysics. 



"Phenotypic and genotypic effects of mutations in 
the human retinoblastoma gene." 
Supervisor: Dr. R. A. Phillips 



1989 



Honours B.Sc 

'The in vitro metabolism of the cytochrome P-448 
inducer p-naphthoflavone in C57BL/6J mice." 
Supervisor: Dr. G. D, Sweeney 



McMaster University, 
Hamilton, Ontario, Canada. 
Department of Biochemistry 



1983 
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ACADEMIC AWARDS 



Imperial Cancer Research Fund Postdoctoral Fellowship 

Medical Research Council Studentship 

NSERC Undergraduate Summer Research Award 

Society of Chemical Industry Merit Award (Hons. Biochem.) 

Dr. Harry Lyman Hooker Scholarship 

J.L.W. Gill Scholarship 

Business and Professional Women's Club Scholarship 
Wyerhauser Foundation Scholarship 



1989-1992 
1983-1988 
1983 



1983 



1981-1983 
1981-1982 
1980-1981 
1979-1980 



INVITED PRESENTATIONS 

Genentech's gene discovery pipeline: High throughput identification, cloning and 
characterization of novel genes. Functional Genomics: From Genome to Function, Litchfield 
Park, AZ, USA. October 2000 

High throughput identification, cloning and characterization of novel genes. G2K:Back to 
Science, Advances in Genome Biology and Technology I. Marco Island, FL, USA. February 



Quality control in DNA Sequencing: The use of Phred and Phrap. Bay Area Sequencing 
Users Meeting, Berkeley, CA, USA. April 1999 

High throughput secreted protein identification and cloning. Tenth International Genome 
Sequencing and Analysis Conference, Miami, FL, USA. September 1998 

The evolution of DNA sequencing: The Genentech perspective. Bay Area Sequencing Users 
Meeting, Berkeley, CA, USA. May 1998 

Partial Growth Hormone Insensitivity: The role of GH-receptor mutations in Idiopathic Short 
Stature. Tenth Annual National Cooperative Growth Study Investigators Meeting, San 
Francisco, CA, USA. October, 1996 

Growth hormone (GH) receptor defects are present in selected children with non-GH-deficient 
short stature: A molecular basis for partial GH-insensitivity. 76 th Annual Meeting of The 
Endocrine Society, Anaheim, CA, USA. June 1994 

A previously uncharacterized gene, myl, is fused to the retinoic acid receptor alpha gene in 
acute promyelocytic leukemia. XV International Association for Comparative Research on 
Leukemia and Related Disease, Padua, Italy. October 1991 



2000 
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PATENTS 

Goddard A, Godowski PJ, Gurney AL. NL2 Tie ligand homologue polypeptide. Patent 
Number: 6,455,496. Date of Patent: Sept. 24, 2002. 

Goddard A, Godowski PJ and Gurney AL. NL3 Tie ligand homologue nucleic acids. Patent 
Number: 6.426,218. Date of Patent: July 30, 2002. 

Godowski P, Gurney A, Hillan KJ, Botstein D, Goddard A, Roy M, Ferrara N, Tumas D, 
Schwall R. NL4 Tie ligand homologue nucleic acid. Patent Number: 6,4137,770. Date of 
Patent: July 2, 2002. 

Ashkenazi A, Fong S, Goddard A, Gurney AL, Napier MA, Tumas D, Wood Wl. Nucleic acid 
encoding A-33 related antigen poly peptides. Patent Number: 6,410,708. Date of Patent:: 
Jun. 25, 2002. 

Botstein DA, Cohen RL, Goddard AD, Gurney AL, Hillan KJ, Lawrence DA, Levine AJ, 
Pennica D, Roy MA and Wood Wl. WISP polypeptides and nucleic acids encoding same. 
Patent Number: 6,387,657. Date of Patent: May 1 4, 2002. 

Goddard A, Godowski PJ and Gurney AL. Tie ligands. Patent Number: 6,372,491. Date of 
Patent: April 16, 2002. 

Godowski PJ, Gurney AL, Goddard A and Hillan K. TIE ligand homologue antibody. Patent 
Number: 6,350,450. Date of Patent: Feb. 26, 2002. 

Fong S, Ferrara N, Goddard A, Godowski PJ, Gurney AL, Hillan K and Williams PM. Tie 
receptor tyrosine kinase ligand homologues. Patent Number: 6,348,351 . Date of Patent: 
Feb. 19, 2002. 

Goddard A, Godowski PJ and Gurney AL. Ligand homologues. Patent Number: 6,348,350. 
Date of Patent: Feb. 19, 2002. 

Attie KM, Carlsson LMS, Gesundheit N and Goddard A. Treatment of partial growth 
hormone insensitivity syndrome. Patent Number: 6,207,640. Date of Patent: March 27, 
2001. 

Fong S, Ferrara N, Goddard A Godowski PJ, Gurney AL, Hillan K and Williams PM. Nucleic 
acids encoding NL-3. Patent Number: 6,074,873. Date of Patent: June 13, 2000 

Attie K, Carlsson LMS, Gesunheit N and Goddard A. Treatment of partial growth hormone 
insensitivity syndrome. Patent Number: 5,824,642. Date of Patent: October 20, 1998 

Attie K, Carlsson LMS, Gesunheit N and Goddard A. Treatment of partial growth hormone 
insensitivity syndrome. Patent Number: 5,646,1 13. Date of Patent: July 8, 1997 



Multiple additional provisional applications filed 




Audrey D. Goddard, Ph.D. 



page 5 of 9 



PUBLICATIONS 

Seshasayee D, Dowd P, Gu Q, Erickson S, Goddard AD Comparative sequence analysis of 
the HER2 locus in mouse and man. Manuscript in preparation. 

Abuzzahab MJ, Goddard A, Grigorescu F, Lautier C, Smith RJ and Chernausek SD. Human 
IGF-1 receptor mutations resulting in pre- and post-natal growth retardation. Manuscript in 
preparation. 

Aggarwal S, Xie, M-H, Foster J, Frantz G, Stinson J, Corpuz RT, Simmons L, Hillan K, 
Yansura DG, Vandlen RL, Goddard AD and Gurney AL. FHFR, a novel receptor for the 
fibroblast growth factors. Manuscript submitted. 

Adams SH, Chui C, Schilbach SL, Yu XX, Goddard AD, Grimaldi JC, Lee J, Dowd P, Colman 
S., Lewin DA. (2001) BFIT, a unique acyl-CoA thioesterase induced in thermogenic brown 
adipose tissue: Cloning, organization of the human gene, and assessment of a potential link 
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We have enhanced the polymerase chain 
reaction (PGR) such that specific DNA 
sequences can be detected without open- 
ing the reaction tube. This enhancement 
requires the addition of ethidium bromide 
(EtBr) to a PGR- Since the fluorescence of 
EtBr increases in the presence of double* 
stranded (ds) DNA an jmcrease in fluores- 
cence in such a PGR indicates a positive 
amplification, which can be easily moni- 
tored externally. In fact, amplification can 
be continuously monitored in order to 
follow its progress. The ability to simulta- 
neously amplify specific DNA sequences 
and detect the product of the amplification 
both simplifies and improves PGR and 
may facilitate its automation and more 
widespread use in the clinic or in other 
situations requiring high sample through- 
put 

Although the potential benefits of PCR 1 to. clin- 
ical diagnostics arc well known, 2,5 , it is still not 
widely used in this setting, even though it is 
four years since thcjro*j*t»l>l« DNA polymer- 
ase* 4 made PCR practical. Some of the reasons for its slow, 
acceptance are high cost, tack, of automation of pre-t and 
post-PCR processing steps, and false positive results, from 
arryovcT-contaminatiorj. The first two points arc related 
in that labor is the largest contributor to cost at the present 
stage of PCR development. Most current assays require 
some form of "downstream" processing once thermocy- 
ding is done in order io determine whether the target 
DNA sequence was- present and has amplified. The*e 
include DNA hybridisation*-*, gel ekctopboresis with or 
without use of restriction digestion*;*, HPIX?, or capillary 
electrophoresis 10 . These methods are labor-intense, have 
low throughput, and are difficult to automate. The third 
point is abo closely related to downstream processing. 
The handling of the PCR product in these downstream 
processes increases the chances that amplified DNA will 
spread through the typing- lab, resulting in ■ a .risk of 



carryover" false positives in subsequent testing". 

These downstream processing steps would be elimi- 
nated if specific amplification and detection of amplified 
DNA took place simultaneously within an unopened re- 
action vessel Assays in which such different processes take 
place without , the need to separate reaction components 
have been termed ^homogeneous'". No truly homogc-. 
rieous PCR assay has been demonstrated to date, although 
progress towards this end has been reported.- Chehab, et 
aj.™ developed a PCS. product detection scheme using 
fluorescent primers that resulted in a fluorescent PGR 
product AUefc-spedfic primers, each with different fluo- 
rescent tags, were used to indicate' the genotype of the 
DNA. However, the unincorporated primers must still be 
removed in a do wnstream process in order to visualize the 
result Recently, Holland, et al. ls , developed an assay in 
which the endogenous 5' exonudease assay of Taj DNA 
polymerase was exploited to deave a labeled <>ligonucleo- 
tide probe. The probe would only ckave if PGR amplifi- 
cation had produced its coroplerHeataiy sequence, to 
order to detect the dcavage products, however, a subse- 
quent process w again' needed. . 

Wc have developed a truly homogeneous assay for PCR 
and PCR product detection based upon the greatly in- 
creased fluorescence that ethidium bromide and other 
DNA binding dyes exhibit when they are bound to.ds- 
DNA ,4 - ,e . As outlined in Figure 1, a prototype PCR 
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increase in total Ruorcsccnce. 
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fWlidi 3 Gel electrophoresis of KSt amplification products of the 
human, ttucfear gene, HLA DQa, made in the pretence of 
increasing amounts of EtBr (up to S H-gtaQ. The presence of 
EtBr has no obvious effect on the yield or specificity of amplifi- 
cation. 





H6QIE 3 (A) Fluorescence measurements from PCRs that contain 
0.5 pg/m! EtBr and that are specific for Y-eJirotnosonae repeat 
sequences. Five replicate PCRs were begun containing each of tbe 
DNAs specified. At each indicated cycle, one of the five replicate 
PCRs for each DNA was removed from thennocyding and Hs 
fluorescence measured. Units of fluorescence are arbitrary. (B) 
UV photography of PCRtubes (0.5 ml Eppcndorf-stylc, polypro- 
pylene micro-centrifuge *ubcs) containing reactions, those Matt, 
tng from 2 ng male DNA and control reaction; without any DMA, 
from (A). 



begins with primers that are single-stranded DNA {ss- 
DNA), dNTPs, and DNA polymerase! An amount of 
dsDNA containing the target sequence' (target DNA) is 
also typically present. This amount can vary, depending 
on the application, from single<ell amount* of DNA 17 to 
micrograms per PCR- 8 . If EtBr is present, the reagents 
that will fluoresce, in order of increasing fluorescence, are 
free EtBr itself, and EtBr bound to the single-stranded' 
DNA primers and to the double-stranded target DNA (by 
its intercalation between the stacked bases of tbe DNA 
doublc-hciw,). After the first denatu ration cyde, target 
DNA will be largely single-stranded. After a PCR is 
completed, the most significant change is the increase in 
the amount of dsDNA (the PGR product itself) of up to 
several micrograms. Formerly free EtBr is bound' to the 
additional dsDNA* resulting in an increase in fluores- 
cence. There is also some decrease in the amount of 
ssDNA primer, but because tbe binding of EtBr to ssDNA 
is much less than to dsDNA, the effect of this change on 
the total fluorescence of the sample is small. Tbe fluores- 
cence increase can be measured by directing excitation 
illumination through the walls of the amplification, vessel 
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before and after, or even continuously during, thermocy- 
ding 



RESULTS 

PCR in fiic presence of EtBr. In order to assess the 
affect of EtBr in PCR, amplifications of the human Hlj\ 
DQtx gene" were performed with the dye present at 
concentrations from 0.06 to 8.0 ftg/ml (a typical concen- 
tration of EtBr used tn staining of nucleic acids following 
gel electrophoresis is 0.5 p-g/mt). As shown in Figure 2, gel 
electrorjhorcsis revealed little or no difference in the yield 
or quality of the amplification product whether EtBr was 
absent or present at any of these concentration!;, indicat- 
ing that EtBr does not inhibit PGR. 

Detection of human Y-ehromosorac specific 
qnencesj- Sequence-specific, fluorescence enhancement of 
EtBr as a result of PCR was demonstrated in a scries of 
amplifications containing 0.5 ug/ml EtBr and primes 
specific to repeat DNA sequence* found on the human 
Y -chromosome 20 . These PCRs initially contained either 
60 ng male, 60 ng female, 2 ng mak human or no DNA. 
Five replicate PCRs were begun for each DNA. After 0, 
17, 21, 24 and 29 cycles of therniocyding, a PCR for each 
DNA was removed from the thermocyder, and its fluo- 
rescence measured in a spectrofmorometer and plotted 
vs. amplification cyde number (Fig. 3 A). The shape of this 
curve reflects the fact that by the time an increase in 
fluorescence can be detected, the increase in DNA is 
becoming linear and not exponential with cyde number. 
As shown, the fluorescence increased about three-fold 
over the background fluorescence for the PCRs contain- 
ing human male DNA, but did not significantly increase 
for negative control PCRs, which contained either no 
DNA or human female DNA. The more male DNA 
present to begin with— 60 ng versus 2 ng— the fewer 
cycle.? were needed to give a detectable increase in fluo- 
rescence. Gel dectrophoresi* oo the products of these 
amplifications showed that DNA fragments of the ex- 
pected size were made in the male DNA containing 
reactions and that little DN A synthesis took place in the 
control samples. 

In addition, the increase in. fluorescence was visualized 
by simply laying the completed, unopened PCRs on s. UV 
transilhiminator and photographing tb'cm. through a red 
filter. This te shown in figure 3B for the reactions thai 
began with 2 ng male DNA and those with no DNA- 

Detection of specific alleles of the human 8-gtobm 
gene. In order to demonstrate that this approach has 
adequate apedfidty to allow genetic screening, a dttcction 
of the skkle-cdl anemia mutation was performed. Figure 
4 shows the fluorescence from completed amplifications 

containing EtBr (0.5 i*.g/ml) at dtftMtfrd bf photography 

of the reaction tubes on a UV cratLsiliominaior. These 
reactions were performed using primers spedftc for ei- 
ther the. wild-type or skkle-ceil mutation of the human 
fUglobin gene . The specificity for each allele is imparted 
by placing the sickle-mutation site at the terminal 3' 
nucleotide of one prtracT. By using an appropriate primer 
annealing temperature, primer extension— -and thus am- 
plification: — can take place only if the 3' nucleotide of the 
primer is coratjlcmcptary to the 3-giobin allele present*' • 
Each pair of amplifications shown in Figure 4 consists of 
a reaction with either ibe wild-type allele spedtie (left 
tube) or skkle-atlele specific (right tube) primers. Three 
different DNAs were typed: DNA from a homozygous, 
wild-type p-giobin individual (AA); from a heterozygous 
sickle p-globin individual (AS); and from a homozygous 
sickle B-globb individual (SS). Each DNA (50 ng genomic 
DNA to start each PGR) was analyzed m triplicate (3 pairs 
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c f reactions each). The DNA type was reflected in tbc 
-jative fluorescence intensities in each pair of completed 
amplifications. There was a significant increase in fluores- 
^jioc only where a fJ-globin allele DNA matched the 
primer set. When measured ■ on a spcctroflnoromctcr 
{data " ot shown), this fluorescence was about three times 
present in a PCR where both 0-gtobin alleles were 
jnbitiatchcd to the primer set. Gel electrophoresis (not 
„[town) established that this increase in fluorescence was 
due to the synthesis of nearly a microgram of a DNA 
frjigmcnt of the expected size for fJ-globin. There was 
(itdc synthesis of dsDNA in reactions in. which the allele- 
specific primer was mismatched to both alleles. 

Continuous monitoring of a PCR. Using a fiber optic 
devieefit is possible to direct excitation illumination from 
ji spectrofl uorometer to a PCR undergoing thcrmocyclirtg 
and to return its fluorescence to the spcctrofluororoctCT. 
The fluorescence readout of such an arrangement, di- 
rected at an EtBr-concaining amplification ofY-chromo- 
sodjc specific sequences from 25 ng of human male DNA, 
is shown in Figure 5. The readout from a control PCR 
w hh no target DNA is also shown. Thirty cycles of PCR 
were monitored for cach- 

The fluorescence trace as a function of time dearly 
shows the effect of the thertnocyding. Fluorescence inten- 
sity rises and falls inversely with temperature The fluo- 
rescence intensity is minimum at the denaturalion tem- 
perature (&4°C) and maximum at AcanncaUng/extension 
temperature (50°C>. In the negative-control PCR, these 
fluorescence maxima and minima do not change signifi- 
cantly over the thirty tbcrraocycles, indicating that there is 
Httlc dsDNA synthesis without the appropriate target 
DNA, and there is little if any bleaching of EtBr during 
lh< continuous illumination of the sample. 

In the PCR containing male DNA, the fluorescence 
maxima at the annealing/extension temperature begin to 
increase at about 4000 seconds" of therrnc<yding, and 
continue to increase with time, indicating that dsDNA is 
being produced at a detectable level. Note that the fluo- 
rescence minima at the denatwation temperature do not 
significantly increase, presumably because at this temper- 
ature there is no dsDNA for EtBr to bind. Thus the course 
of the amplification is followed by tracking the nuorcs-. 
cence increase at the aancahng temperature. Analysis of 
the products of these two amplifications by gel electropho- 
resis showed * DNA fragment of the expected size for the 
male DNA containing sample and no detectable DNA 
synthesis for the control sampte. 

DISCUSSION 

Downstream processes such as hybridization to a se- 
quence-Apedfic probe can enhance the specifidty of DMA 
dctei_iiun by PGR. The cHmitwOon of these processes 
means that' the specificity of this homogeneous assay 
depends solely on that of PCR. In the case of sickle-cell 
ditcase, wc have shown that PGR alone has sufficient DNA 
sequence specifidty to permit genetic screening. Using 
appropriate amplification conditions, there is Bttlc non- 
specific production of dsDNA in the absence of the 
appropriate target alkie. 

The spedfidty required to detect pathogens can be 
more or less than that required' to do genetic screening, 
dependi ng on the number of pathogens in the sample and 
the amount of other DNA that must be taken with the 
sample. A difficult target is HIV, which requires detection 
of a viral genome that can be at the level of a few copies 
per thousands of host cells*. Qsnpared with genetic 
screening, which is performed on ceils containing at least 
one copy of the target sequence, HIV detection .requires 
both more specifidty and the input of more total 




Homozygous 

AA 



Heterozygous 

AS 



Homozygous 

ss 



RGQBE 4 UV photography of PCR tubes containing amplifications 
using EtBr mat are spedfte to wild-type (A) or node (5) alldei of 
the fnnrao 0-globin gene. The left of ««h pair of tubes contains 
aSek-spcd&c primers to the wild-type alleles, the right tube 
p rimers to the sickle atlek- The photograph was talen after 30 
cycles of PCR, aad the input DNAs and the alkies they contain 
are indicated. Fifty Bg of DNA was used to begin PCR. Typing 
was done in triplicate (3 paint oFPCRs) for each input DNA: 
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RGfflff S Continuous, real-time monitoring of a PCR. A fiber optic 
was oscd to carry excitation light tn a PCR in progress and also 
emitted light back to a fluoro meter (see Experimental PnSoool). 
Amplification Uliag human malo-DNA specific pritnen in a PCR 
Starting with 20 ng of human, male DNA (too), or in a control 
PCR without DNA {bottom), were monitored. Thirty cydes of 
FCR were followed for each. The temperature cycled between 
94*C (denaturation) and 50*C (amiealidg and extension). Note in 
the male DNA PCR, . the cycle (time) dependent mereaic in 
fluorescence at the aoneaBftg/cxtension temperature. 
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DNA— -up to microgram amounts— in order to have suf- 
ficient numbers of target sequences. This large amount of 
starting DNA in an amplification significantly increases 
the background fluorescence over which any additional 
fluorescence produced by PCR must be detected. An 
additional complication that occur; with targets in low 
copy-number is the formation of the "primer-duner" 
artifact. This is the result of the extension of one primer 
using the other primer as a template. Although this occurs 
infrequently, once it occurs the extension product is a 
substrate for PCR amplification, and can compete with 
true PCR targets if those targets are rare. The primer- 
dimcr product is of course dsDNA and thus is a potential 
source of false signal in this homogeneous assay. 

To increase PCR specificity and reduce the effect of 
primer-dimcr amplification, we are investigating a num- 
ber of approaches, including the use of nested-primer 
amplifications that take place in a single tube 8 , and the 
"hot-start", in which nonspecific amplification is reduced 
by raising the temperature of the reaction before DNA 
synthesis begins**. Preliminary results using these ap- 
proaches suggest tbatprijncr-diiweT is effectively reduced 
and it is possible to detect the increase in EtBr fluores- 
cence in a PCR instigated by a single HIV genome in a 

background Of 10* cells. With larger numbers of ccfls, the 

background fluorescence contributed by genomic DNA 
becomes problematic. To reduce this background, h may 
be possible to use sequence -specific DNA-binding dyes 
that can be made to preferentially bind PCR product over 
genomic DNA by incorporating the dye-binding DNA 
sequence into the PGR product through a 5' "add-on" to . 
the oligonucleotide proacr*" 1 . 

We have shown that the detection of fluorescence 
generated by an EtBr-containing PCR is straightforward, 
both once PCR is completed and continuously during 
thermocyefing. The ease with which automation of spe- 
cific DNA detection can be accomplished is the most 
promising aspect of this- assay. The fluorescence analysis 
of completed PCRs is alrcadyposslblc with existing instru- 
mentation in 96-well format**. In this format, the fluores- 
cence in each PCR can be quantitated before, after, and 
even at selected points during therraocyeiing by moving 
the rack of PCRs to a 96-rnieroweH plate fluorescence 
reader". 

The instrumentation necessary to continuously monitor 
multiple PCRs simultaneously is also simple in principle. 
A direct extension of the apparatus used here is to have 
multiple fiberoptics transmit the excitation light and flu- 
orescent emissions to and from multiple PCRs. The ability 
to monitor multiple PCRs continuously may allow quan- 
titation of target DNA copy number. Figure 3 shows chat 
the larger the amount of starting target DNA, the sooner 
rlnrinp PT.R a fluorescence increase is detected. Prrfi mi- 
liary experiments <Higuchi and Douinger, manuscript in 
preparation) with continuous monitoring have shown a 
sensitivity to two-fold differences in initial target DNA 
concentration. 

Conversely, if the number of target molecules is 
known — a$ it can be in genetic screemng-TContinuotis 
monitoring may provide a means of detecting false posi- 
tive and false negative results. With a known number of 
target molecules, a true positive would exhibit detectable 
fluorescence by a predictable number of cydes of PCR. 
Increases in fluorescence detected before or after that 
cycle would indicate potential artifacts. False negative 
results due to, for example, inhibition of DNA polymer- 
ase, may be detected by including within each PCR an 
inefficiently amplifying marker. This marker tcsuIes in a 
fluorescence increase only after a large number of cy- 
cles — many more ' than are necessary CO detect a true 



positive. If a sample fails to have a fluorescence increase 
after this many cycles, inhibition may be suspected. Since, 
in this assay, conclusions are drawn based on the presence 
or absence of fluorescence signal alone, such controls may 
be important. In any event, before any test based on this 
principle is ready for the clinic, an assessment of itt false 
positive/false negative rates will need to be obtained using 
a large number of known samples. 

In summary, the inclusion in PCR of dyes whose fluo- 
rescence is enhanced upon binding dsDNA makes it 
possible to detect specific DNA amplification from outside 
the PCR tube. In the future, instruments based upon this 
principle may facilitate the more widespread use of PCR. 
in applications that demand the high throughput of 
samples. 

EXPERIMENTAL PROTOCOL 

Human HLA-DQa gene imptiSeattons containing EtBr. 
PCKs were set up in 1 00 |*l volumes containing 1 0 mM Tris-HQ, 
pH 8.3; 50 mM KCI; 4 mM MgC^: 2.5 unit* of To? DNA 
polymerase (Ferkm-jytiKir Genu. Norwslk, CT); 20 pmdle rach 
of human HLA-DQa gene specific oligonucleotide primers 
GH26 and CH27 1 ' and approximately HP copies of DQ& PCR 
product diluted from a previous reaction. Ethidium bromide 
(EtBr; Sigma} was used at the concentrations indicated in Figure 
2. Thcrmocyding proceeded for 20 cycles in a model 480 
thcrmocydcr (Perliti-EJtner Cetu*, Narwalk, CT) using a "stcp- 
cycte" program of 94*C for 1 mm. deoatiiration and 6CC for SO 
sec toncafing and 72"C for 30 sec. extension. 

Y-chromosomc specific PCR. PCRJ (100 ul total reaoion 
volume) containing (U> jigtol EtBr were prepared as described 
for HLA-DQe, except with diflcrcnt primers and target DNAs. 
These PCRs contained J 5 pmolc cadi male DN A-»pcctfie primen 
YI.l and Vl.2™, arid cither 60 ng male, 60 «g female, 2 ng male, 
or no human DNA. Theimocvding wasSH'C for I min. and 6(FC 
for I min using a "step-cycle"' program. The number of cycles for 
a sample were as indicated in figure 3. Fluorescence measure- 
ment is described below. _ 

Allck-specific, human (s-gtobin gcoe PGR. Amplifications of 
100 pi vctlume usmff 0 5 P&ml of LtBr were prepared »s 
described for HLA-DQa above except with diSercnt ptimcrs and 
target DNAs. These PCRs -contained eiuW. primer pair HGF2/ 
Hp MA <wiW-type globm speeiBc primers) or HGF2/H)JMS (skk- 
lc-globin specific primers) at 10 pmole each primer per PCR. 
These primers were developed by Wu ct at- 1 . Three different 
target DNA* were tutcd in separate amplifications!— 50 ng each of 
human DNA that was homozygous for the ifcMe trait (5S), DMA 
that was heterozygous for the sickle trait (AS), or DNA that was 
homozygous for the w.i. globin (AA). Thcrmocydmg was for 30 
cycles at 94"C for 1 min. and 55'C for 1 min. itsmg a "stcp-cyde'' 
program. An anneaHtfg temperature of 55°C had been shown try 
Wu et al. 21 to provide allclc-spceific awpBReaUon- Completed 
PCRs were photngraphed through a red filter (Wratten _23A) 
after placing the reaction tube* amp a model TM-S6 trsnsitluO'ii- 
nator (OV-products San- Gabriel, CA). 

Fluorescence measurement. Fluorescence mcasurcnients were 
made oh PCRs containing EtBr m a Fluorolog.2 uuorornetcr 
(SPEX. Edison, NJ). Excitation was at the 500 nra band with 
about 2 nm bandwidth with a GG 455 my cut-off.fil wt- VMc les 
Crist Inc.. Irvine. CA) to exclude secendorder light. Emitted 
light was detected at 570 nm with a bandwidth of about 7 nm. An 
OG 530 nm cut-on" filter was used to remove the excitation bjdit- 

ContitHtou* ftnotreseence monitoring of FCR, Connnuous 
monitoring of a PCR in prC-fites* was accomplished using the 
spectrofiuorometer and settings describee ttbovc as well as a 
nleroptie accessory (SPJEX caL no. 1950) to both send excitation 
light to. and receive emitted Ught from, a PCR placed m a well ai 



£lil W. dUU l«tCl»V Vi'">^v.s» -*S . — " , 

model 480 tiiermocydcr (Pcrim-Elnier Cetus). The prc*e^e^ 




the PGR tube and the end of the fiberoptic cable were slii 
from room light and the rOOffl lights were kept dunmeo durmg 
each ru»- The monitored PCS was an amplification of ¥-cbro- 
mosomc-speeinc repeat sequences as described above, except 
using an anncahng/extension letnperauirc of 50°C. The reaction 
was covered -with m'uxrti Oil (2 drops) to prcvett evaporation. 
Thcrmocyding and fluorescence measurement vers started si- 
multaneously, A time-base SC*t\ Willi a 10 second integranCii tune 
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was uted and the emission signal was ratioed to' tbc excitation 
jjgd.il to control for changes in li^ht-sourcc intensity. Data were 
(ollccied using the draSOOOf, version 2.5 (SPEX) data system. 
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sCD-14 EUSA 

Trauma, Shock and Sepsis 




The CD-14 moiecule is' expressed on the surface of 
monocytes and some macrophages. Membrane- 
bound CD-14 is a receptor for lipopotysaccharicle 
(LPS) complexed to LPS-Binding-Protein (LBP). "me 
concentration of its soluble form is altered under 
certain pathological conditions. There is evidence for 
an Important role of sCD-14.with pdytrauma, sepsis, 
burnings and inflammations. 
During septic conditions and acute infections it seems 
to be a prognostic marker and is therefore of value in 
monitoring these patients. 



IBL offers an EUSA for quantitative determination of 

soluble CD-14 in human serum, -plasma, cell-culture 

supernatants and other biological fluids. 

Assay features: 12x8 determinations 
(microliter strips), 
precoated with a specific 
monoclonal antibody, 
2x1 hour incubation, 
standard range: 3-96 ng/mi 
detection limit: 1 ng/ml 
CV: intra- and interassay < e% 



for more information call or fax 
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SIMULTANEOUS AMPLIFICATION AND DETECTION OF 
SPECIFIC DNA SEQUENCES 

Russell Higuchi*, Gavin Dollinger 1 , P. Sean Walsh and Robert Griffith 
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We have enhanced the polymerase chain 
reaction (PGR) such that specific DNA 
sequences can be detected without open- 
ing the reaction tube. This enhancement 
requires die addition of ethidium bromide 
(EtBr) to a PCR. Since the fluorescence of 
EtBr increases in the presence of double- 
stranded (ds) DNA an increase in fluores- 
cence in such a PGR indicates a positive 
amplification, which can be easily moni- 
tored externally. In fact, amplification can 
be continuously monitored in order to 
follow its progress. The ability to simulta- 
neously amplify specific DNA sequences 
and detect the product of the amplification 
both simplifies and improves PGR and 
may facilitate its automation and more 
widespread use in the clinic or in other 
situations requiring high sample through- 
put 

Although the potential benefits of PCR 1 to clin- 
ical diagnostics arc well kiiowij 2,8 , it is still not 
widely used in- this setting, even though it is 
Four years tiaoq thcrmo*tabl« DNA polymer- 
ase* 4 made PCR practical. Some of the reasons for its slow, 
acceptance are high cost, lack of automation of pre-t and 
post-PCR processing steps, and false positive results, from 
carryovcT-contaminadon. The first two points arc related 
in that labor is the largest contributor to cost at the present 
stage of PCR development. Most Current assays require 
some form of "downstream" processing once thermocy- 
ding is done in order to determine whether the target 
DNA sequence was- present and has amplified. These 
include DNA hybridisation 8 -*, gel electrophoresis with or 
without use of restriction digestion*' 8 , HPliC?, or capillary 
electrophoresis 10 . These methods are labor-intense, have, 
low throughput, and arc difficult to automate. The third 
point is abo closely related to downstream processing. 
The handling of the PCS. product in these downstream 
processes increases the chances that amplified DNA will 
spread through the typing lab, resulting in -a. risk of 



carryover" false positives in subsequent testing . 

These downstream processing steps would be elimi- 
nated if specific amplification and detection of amplified 
DMA took place simultaneously within an unopened re- 
action vessel Assays in. which such different processes take 
place without. the need to separate reaction components 
have been termed '■homogeneous''. Ho truly homogc-. 
rieous PCR assay has been demonstrated, to date, although 
progress towards this end has been reported.- Chehab, et 
al.", developed a PCR product detection scheme using 
fluorescent primers that resulted in a fluorescent PCR 
product Alldc-specinc primers, each with different fluo- 
rescent tags, were used to indicate the genotype of the 
DNA. However, the unincorporated primers must still be 
removed in a do wnstream process in order to visualize the 
result. Recently, Holland, et al. 15 , developed an assay in 
which the endogenous 5' ejtodudease assay of Taj DNA 
polymerase was exploited to cleave a labeled oligonucleo- 
tide probe. The probe would only cfcave if PCR axnpSfi- 
catiod had produced its complementary sequence. In 
order to detect the dcavage products, however, a subse- 
quent process is again needed. . 

We have developed a truly homogeneous assay for PCR 
and PCR product detection based upon tbc gready in- 
creased fluorescence that ethidhirn bromide and other 
DNA binding dyes exhibit when they arc bound to.ds- 
DNA ,4 ^ IS . As outhncd in Figure 1, a prototypk PCR 
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(KWU- 1 Principle of simultancoua amplification and- detection of 
PCR product: The components of a PCR containing EtSr that are 
fluorescent arehstcd— EtBritself, EtBr bound toother ssDNA or 
dsDN A- There k a lar^e fluorescence enhancement when EtBr is 
bound to DNA and btndinjr is greatly enhanced -when DNA is 
douhle-strandcd. After sumdent <n)..cydcs of PGR. the.net 
increase in dipNA results in additional -EtBr bonding, and ft net 
increase in total Ruorciccncc: 
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ftWB 2 Get electrophoresis of PCR am plifi cation products of the 
human, ftudcar gene, HLA DQa, made in the presence of 
incrouiufi amounts of EtBr (up to 8 fLg/ml). The presence of 
EtBr has no obvious effect on the yield or specificity of amplifi- 
cation. 
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HGUM J (A) fluorescence measurements from PCRs that contain 
0.5 u,g/m] EtBr and that are specific for Y-cJiroinosome repeat 
sequence*. Five replicate PCRs. were begun containing each of the 
DNA» specified. At each indicated cyde, one of the five replicate 
PCRs for each DNA -was removed from thcrmocyding and ft* 
fluorescence measured. Units of fluorescence are arbitrary. (B) 
UV photography of PGR tubes (0.5 nil Eppcndorf-stylc, poiypr^- 
pykiyj tnicro-eentrifuec tubes) containing reactions, those start, 
ing from 2 ng male DNA and control reactions without any DNA, 
froa (A). 



begins with primers that are single-stranded DNA (ss- 
DNA), dNTPs, and DNA polymerase. An' amount of 
dsDNA containing the target sequence (target DNA) is 
also typically present. Thi$ amount can vary, depending 
on the application, from single-cell amounts of DNA lT to 
micrograms per PCR- 8 , If EtBr is present, the reagents 
that will fluoresce, in order of increasing fluorescence, are 
free EtBr itself, and EtBr bound to the single-stranded 
DNA primers and to the double-stranded target DNA (by 
its intercalation between the stacked bases of the DNA 
doubk^hefis). After the first denaturation cyde, target 
DNA will be largely single-stranded. After a PCR is 
completed, the most significant change is the increase in 
tbc amount of dsDNA (the PCR.- product itself) of up to 
several micrograms. Formerly free EtBr is bound to the 
additional dsDNA., resulting in an increase in fluores- 
cence. There is also some decrease in the amount of 
ssDNA primer, but because tbc binding of EtBr to ssDN A 
is much Jess than to dsDNA, the effect of this change on 
the total fluorescence of the sample is small. The fluores- 
cence increase can be measured by directing excitation 
illumination through the walls of the amplification vessel 
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before and after, or even cominuously during, thennocy- 
cting. 



RESULTS 

PCR in the presence of EtBr. Io order to assets tht 
affect of EtBr in PCR, amplifications of the human Hl^\ 
DQa gene" were performed with the dye present at 
concentrations from 0.06 to 8.0 fig/ml (a typical concen- 
tration of EtBr used in staining of nucleic acids following 
get electrophoresis is 0.5 u-g/mf). As shown in Figure 2,-gej 
electrophoresis revealed little or no difference in the yield 
or quality of the amplification product whether EtBr was 
absent or present at any of these conceit tratkms, indicat- 
ing that EtBr does not inhibit PCR. 

Detection of human Y-chromosomc specific se* 
cjneitces- Sequence-specific, fluorescence enhancement of 
EtBr as a result of PCR was demonstrated in a scries of 
amplifications containing 0.5 tig/mf EtBr and primers 
specific to repeat DNA sequences found on the human 
V-chromosomc 40 - These PCRs initially contained cither 
€0 ng male, 60 ng female, 2 ng roak human or no DNA. 
Five replicate PCRs were begun for each DNA. After 0, 
17, 21 , 24 and 29 cycles of theroiocyding, a PCR for cadi 
DNA was removed from the theriuocyder, and its. fluo- 
rescence measured in a spectrofluorotneter and plotted 
vs. amplification cyde number (Fig. 3 A). The shape of this 
curve reflects the fact that by the tune an increase in 
fluorescence can be detected, the increase in DNA is 
becoming linear and not exponential with cyde number. 
As shown, the fluorescence increased about three-fold 
over the background fluorescence for the PCRs contain- 
ing human male DNA, but did not significantly increase 
for negative control PCRs, which contained cither no 
DNA or human female DNA. The more male DNA 
present to begin with— 60 ng versus 2 ng— the fewer 
cycles were needed to give a detectable increase in fluo- 
rescence. Gel electrophoresis o» the products of these 
amplifications showed that DNA fragments of the ex- 
pected size were made in the male DNA containing 
reactions and that little DN A synthesis took place in the 
control samples. 

In addition, the increase in. fluorescence was visualized 
by simply laying the completed, unopened PCRs on a UV 
transilhuninatOT and photographing them through a red' 
filter. This is shown in figure SB tor the reactions thai 
began with 2 ng male DNA and those with no DNA. 

Detection of specific alleles of Ac human fl-globin 
gene. In order to demonstrate that this approach has 
adequate spedfidty to allow genetic SCTeening, a dttcction 
of the' Skklc-cdl anemia mutation was performed. Figure 
4 shows the fluorescence from completed ampdSficationj 

containing EtBr (0.5 iig/ffil} a* detected l*y photography 

of the reaction tubes on a UV transilltuninator. These 
reactions were performed using- primers spedftc for ci- 
ther the. wild-type or skkk-cell mutation of the human 
p-globin gene . The speciftdty for each allele is imparted 
by placing the sickle-mutation site at the terminal V 
nucleotide of one primer. By using an appropriate primer 
annealing temperature, primer extension— -and thus am- 
plification — can take place only if the 5' nucleotide of the 
primer is complementary to the 0-gtobin allele present" • 
Each pair Of amplifications shown in Figure 4 consists of 
a reaction with either the wiM-typc allele Specific flcft 
tube) or sickle-aUele specific (right tube) primers. Three 
different DNAs were typed: DNA from a homozygous, 
wild-type p-giobin individual (AA); from a heterozygous 
sickle p-giobin individual (AS); and from a homozygous 
sickle pVgUjbin Individual (SS). Each DNA (50 ng genomic 
DNA to start cadi PGR) was analyzed m triplicate (3 pairs 
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e f react* 0115 each). The DNA .type was reflected in the 1 
relative fluorescence intensities in each pair of completed I 
sm plificAtions. There was a significant increase in fluores- I 
eence only where a {J-globin allele DNA matched the t 
primer set. When measured ■ on a spectrofluorometer I 
Mata not shown), this fluorescence was about three times 

present in a PCR where both 0-globin allcics were | 
^isiuatchcd t.o the primer set. Gel electrophoresis (not 
„)iown) established that this increase in fluorescence was 
due to the synthesis of nearly a microgram of a DNA 
fragment of the expected size for p-globin. There was 
litdc synthesis of dsDNA in reactions in. which the allele- 
spedfic primer was mismatched to both alleles. 

Continuous monitoring of a PGR, Using a fiber optic 
device? it is possible to direct excitation illumination from 
fl spectrofluorometer to a PCR undergoing thcrmocycling 
and to return its fluorescence TO the Kpcctrofluorometcr. 
The fluorescence readout of such an arrangement, di- 
rected at an EtBr-concaining amplification ofY-chromo- 
sornc specific sequences from 25 of human mate DNA, 
is shown in Figure 5. The readout from a control PCR 
w jth no target DNA is also shown. Thirty cycles of PCR 
were monitored for cach- 

The fluorescence trace as a function of time dearly 
shews the effect of the thermocyding. Fluorescence inten- 
sity rises and. falls inversely with temperature. The fluo- 
triecncc intensity is minimum at the denaturalion tem- 
perature (94°C) and maximum at the annc^Uns/extension 
temperature (SOX). In the negative-control PCR, these 
fluorescence maxima and minima do not change signifi- 
cantly over the thirty thcrmocydes, indicating that there is 
Utile dsDNA synthesis without the appropriate target 
DNA, and there is little if any Weachrog of EtBr during 
(he continuous illumination of the sample. 

In the PCR containing male DNA, the fluorescence 
maxima at the annealing/extension temperature begin to 
increase at about 4000 seconds' of therrooeytling, and 
continue to increase whh rime, indicating that dsDNA is 
being produced at a detectable level. Note that the fluo- 
rescence minima at the denatwation temperature do not 
significantly increase, presumably because at this temper- 
ature there is no dsDNA for EtBr to bind. Thus the course 
of the amplification is followed by tracking the ftuorcs-. 
cence increase at the annealing temperature. Analysis of 
the products of these two amplifications by gel electropho- 
resis showed a DNA fragment of the expected size for the 
mate DNA containing sample and no detectable DNA 
synthesis for the control sampte. 

DISCUSSION 

Downstream processes such as hybridization to a se- 
quence-specific probe can. enhance die specificity of DMA 

deceniun by PGR. The clunioatioti of the.*: processes, 
means that' the specificity of this homogeneous assay 
depends solely on thai of PCR. In the case of sickle-cell 
disease, we have shown that PCR alone has sufficient DNA 
sequence specificity to permit genetic screening. Using 
appropriate amplification conditions, there is litdc non- 
specific production of dsDNA in the absence of the 
appropriate target allele. 

The specificky required to detect pathogens can be 
more or less than that required' to do geitedc screening, 
dependi ng on the number of pathogens in the sample and 
the amount of other DNA that must be taken with the 
sample. A difficult target is HIV, which requires detection 
of a viral genome that can be at the level of a few copies 
per thousands of host cells*. Compared widt genetic 
screening, which is performed on ceils containing at least 
one copy of the target sequence, HIV detection requires 
both more specificity and the input of more total 
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FCGBE 4 UV photography of tubes containing amplifications 
using EtBr mat are spedtk'to wild-type (A) or liewe (8) alleles of 
the human 0-globin gene. The left of es»eh pair of tubes Contains 
alMe-spcdfk primers to the wild-type alleles, the right tube 
primers to the sicWe atWe- The photograph was taJxn after 30 
cycles of PCR, and the input DNAs and the alleles they contain 
are indicated- Fifty ng of DNA was used to begin PGR. Typing 
was done in triplicate (3 pairs of PCfo) for each input DNA: 
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RG4ME S Continuous, real-time monitoring of a PCR. A fiber optic 
was osed to carry excitation light tn a PUR in progress and also 
emitted light back to a Auoro meter (see E*penmcntal F^ocot). 
AmpUficaOon using human malo-DNA specific primers in a PCR 
starting with 20 ng of human male DNA {top), or m a control 
PCR without DNA (bottom), were, monitored. Thirty cydes of 
PCR were followed for each. The temperature Cycled between 
94*C (denaturation) and 50*C (annealing and extension). Note in 
the male DNA PCR,. the cycle (rime) depeaSJcm increase in 
fluorescence at AC anrteaBttg/extenaion temperature. 
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DNA— up to microgram amounts— in order to have suf- 
ficient numbers of target sequences. This large amount of 
starting DNA m an amplification signiftcuntly increases 
the background fluorescence over which any additional 
fluorescence produced by PGR must be detected. An 
additional complication that occurs with targets in. tow 
copy-number is the formation of the "primet-dimer" 
artifact. This is the result of the extension of one primer 
using the other primer as a template. Although the occurs 
infrequently, once it occurs the extension product is a 
substrate for PCR amplification, and can compete with 
true PCR targets if those targets are rare. The primer- 
dimcr product is of course dsDNA and thus is a potential 
source of false signal in this homogeneous assay. 

To increase PGR specificity and reduce the effect of 
primer-dirncT amplification, we are investigating a num- 
ber of approaches, including the use of nested-primer 
amplifications that take place in a single tube 9 , and the 
'"hot-start", in which nonspecific amplification is reduced 
by raising the temperature of the reaction before DNA 
synthesis begins**. Ptdhninary results using these ap- 
proaches suggest that prhijer-duweT is effectively reduced 
and it is possible to detect the increase in EtBr fluores- 
cence in a PCR instigated by a single HIV genome in a 
background of 10* Cedls. With larger numbers of ceils, the 
background fluorescence contributed by genomic DNA 
becomes problematic. To. reduce this background, it may 
be possible to use sequence-specific DNA-binding dyes 
that can be made to preferentially bind PCR product over 
genomic DNA by incorporating the dye-binding DNA 
sequence into the PCR product through a 5' "add-on" to . 
the oligonucleotide primer 1 ' 1 . 

We have shown that the detection of fluorescence 
generated by an EtBr-containing PCR is straightforward, 
both once PCR is completed and continuously during 
the rmocy cling. The ease with which automation of spe- 
cific DNA detection can be accomplished is the most 
promising aspect of this assay. The fluorescence analysis 
of completed PCRs is alrcadypossiblc with existing instru- 
mentation in 96-well formatr*. In this format, the fluores- 
cence in each PCR can be auantitated before, after, and 
even at selected points during thermocyciing by moving 
the rack of PCRs to a 96-microwell plate fluorescence 
reader* 6 . 

The instrumentation necessary to continuously monitor 
multiple PCRs simultaneously is also simple in principle. 
A direct extension of the apparatus used here is to have 
multiple fiberoprics transmit the excitation light and flu- 
orescent emissions to and from multiple PCRs. The ability 
to monitor multiple PCRs continuously may allow quan- 
titation of target DNA copy number. Figure 5 show that 
the larger the amount of starting target DNA, the sooner 
during PO.R a fluorescence increase is detected. Prelimi- 
nary experiments <Kiguchi and DoHinger, manuscript in 
preparation) with continuous monitoring have shown a 
sensitivity to two-fold differences in initial target DNA 
concentration. 

Conversely, if the number of target molecules is 
known — as it can be in genetic screening-rcontinuous 
monitoring may provide a means of detecting false posi- 
tive and false negative results. With a known number of 
target molecules, a true positive would exhibit detectable 
fluorescence by a predictable number of cycles of PCR. 
Increases in fluorescence: detected before or after that 
cycle would indicate potential artifacts. False negative 
results due to, for example,. inhibition of DNA polymer- 
ase, may be detected by including within each PCR an 
inefficiently amplifying marker. This marker results in a 
fluorescence increase only after a large number of cy- 
cles — many more' than are necessary to detect a true 




positive. If a sample fails to have a fluorescence increase 
after this many cycles, inhibition may be suspected. Since, 
in this assay, conclusions are drawn based on the presence 
or absence of fluorescence signal alone, such controls may 
be important. In any event, before any test based on this 
principle is ready for the clinic, an assessment of ttfe false 
positive/false negative rates will need to be obtained using 
a large number of known samples. 

In summary, the inclusion ha PCR of dyes whose fluo- 
rescence is enhanced upon binding dsDNA makes it 
possible to detect specific DNA amplification from outside 
the PCR tube. In the future, instruments based upon this 
principle may facilitate the more widespread use of PCR 
in applications that demand the high throughput of 
samples. 

EXPERIMENTAL PROTOCOL 

Hwrnaa HLA-DQre gen* stmpWieatiora cmitaining EtBr. 
PCRs were set up ittlOO |*i volumes containing 10 mM Trts^HCl, 
pH 8.3; 50 mM KC1; 4 roM MgCl^ 2.5 unit* of taq DNA 
polynierase fPertim«£)«*cr Ccou. Norwalk, CT); 20 pinole each 
of human HtA-DQa ' gene specific oligonucleotide primers 
OH26 and CH27 19 and apprcooirrately NT copies of DQa PClt 
product diluted from a previous reaction. Ethidium bromide 
(Ei Br; Sigma) was used at the concentrations indicated in Figure 
2. Thermocyding proceeded for 20 cycles in A model 480 
thsrmocydcr (Pertiti-EImer Ccom, Nn rural k, CT) using a "step- 
cycle" program of 94*C for 1 mm. dcnaturalion ami WrC for 30 
sec annealing and 72°C for 30 sec. extension. 

Y-chromosomc specific PCR. PCRs (J00 ul total reaction 
volume) containing 0.5 jteAnl EtBr were prepared as described 
for HLA-DQc, except with different primers and target DNAs. 
These PCRs contained 1 S proolccach male DNA-*pccific primers 
VI.l and Vl.2 80 , and cither 60 ng male, €0 ng female, 2 ng male, 
err no human DNA. ThermocyCling wasfH'C Tor I min. and SO^C 
for 1 min using a "step-cycle* program. The number of cycles for 
a sample were as indicated in figufe 3. Fluorescence measure- 
ment is described below. 

Allcfc-apccific, human p-gtofrifi pane PGR. Amplifications of 
100 id volume »$ing 0 5 jig/ml of EtBr were prepared »s 
described fiar H LA -DQa above except with different primers and 
target DNAs. These PCRs contained «i»W. primer pair HUPS/ 
HP HA fwBd-type globtn specific primers) or HGF2/Hf>HS (sick- 
lc-riobin specific primers) at 10 pmole each primer per PCR. 
These primers were developed by Wu ct aL 21 . Three different 
target »NA.< were used in separate amplifications — 50 ng eadi or 
human DNA that was homozygous for the steVlc trait <5S), DMA 
that was heterozyrous for the sickle trait (AS), or DNA that 
homozygous for die w.t- globin (AA). Thermocyefing was for SO 
cycles at 94X1 for I min. and 55*C for 1 min. itsmg a "step^y*" 
program. An annealing temperature of 55°C bad been shown by 
Wu et al 21 to provide, allclc-spcdfk; amplification. Completed 
PCIts were photographed through a red filter (Wratien -sA) 
after placing the reaction tube* mop 3 model TM-S6 transffluflu- 
nator <U V-products Sah Gabriri, CA). 

Uttorescerice measurement/Fluorescence measurement* were 
made on PCRs containing EtBr in a Fluorolog.2 fluorometer 
(SPEX Edison, NJ). Excitation was at the 500 ran band wi* 
aootir 2 nm bandwidth with a OG 4S5 nm cutoff filter jMellej 
Grist, Inc.. Irvine. CA) to exclude second-order light Eroiued 
light was detected at 5 Vf) nm with a bandwidth of about 7 nm. An 
OG 530 urn cut -off fitter was used to remove the excitation hp\L 

Contimtoti* fluorescence numitortag of FCR. Continuous 
monitoring; of a PCR in progress was accomplished using the 
apectrofluorometer and settings described above as well as a 
fiberoptic accessory <SP£X cat. no. 1950) to both send exatauoa 
fight to. and receive emitted light' from, a PCR placed in a well as 
a model thwmocyder (Pcrlin-Elmer Cetus). The probe ena 
of the fiberoptic cable was attached with "5 nOTutc'-cpojcy" v> the 
open top of a PCR tube (a 0.5 ml poiypropyienc centrifuge tube 
with its cap removed) effectively scaling it. The exposea top OJ 
the PGR tube and the end of the fiberoptic cabte were slucwca 
from room light and the room lights were kept dimmed during 
each run. The monitored PCR was an amplification of Y-djiO" 
mosomc-spedfk repeat sequences a» described above, excel* 
using an annealing/extension temperature of 50°C. The reaction 
was covered -with tnijieral oil (2 drops) to prevent evapora&on- 
Therroocyding and- SuorcscctlCC measurement were started si- 
multaneously. A uriie-basescanwitti it 10 second mtegranoii' tone 
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k m used and die emission signal was ratioed to' the. excitation 
jignal w control for changes in light-source intensity. D?Ca were 
eSlccted using the draSOOOf, version S.5 (SPEX) data system. 
AcfcnowlexbJBacnts 

We thAni Bob Jones for help with the ^pectiDfluormei.ric 
mtvutircmciit* and Hufherhetl Fong for editing this manuscript. 
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Oligonucleotides with Fluorescent Dyes at 
Opposite Ends Provide a Quenched Probe 
System Useful for Detecting PCR Product 
and Nucleic Acid Hybridization 

Kenneth J. Livak, Susan J.A. Flood, Jeffrey Marmaro, William Giusti, and Karin Deetz 

Perkin-Elmer, Applied Biosystems Division, Foster City, California 94404 



The S' nucleate PCR assay detects the 
accumulation of specific PCR product 
by hybridization and cleavage of a 
double-labeled fluorogenlc probe 
during the amplification reaction. 
The probe Is an oligonucleotide with 
both a reporter fluorescent dye and a 
quencher dye attached. An increase 
In reporter fluorescence intensity In- 
dicates that the probe has hybridized 
to the target PCR product and has 
been cleaved by the 5' -»3' nucle- 
olytJc activity of Taq DNA polymerase. 
In this study, probes with the 
quencher dye attached to an internal 
nucleotide were compared with 
probes with the quencher dye at- 
tached to the 3'-end nucleotide. In all 
cases, the reporter dye was attached 
to the 5' end. All Intact probes 
showed quenching of the reporter 
fluorescence. In general, probes with 
the quencher dye attached to the 3'- 
end nucleotide exhibited a larger sig- 
nal In the 5' nuclease PCR assay than 
the Internally labeled probes. It is 
proposed that the larger signal Is 
caused by Increased likelihood of 
cleavage by Taq DNA polymerase 
when the probe is hybridized to a 
template strand during PCR. Probes 
with the quencher dye attached to 
the 3 '-end nucleotide also exhibited 
an Increase in reporter fluorescence 
intensity when hybridized to a com- 
plementary strand. Thus, oligonucle- 
otides with reporter and quencher 
dyes attached at opposite ends can 
be used as homogeneous hybridiza- 
tion probes. 



^\ homogeneous assay for detecting 
the accumulation of specific PCR prod- 
uct that uses a double-labeled fluoro- 
genic probe was described by Lee et al. (l) 
The assay exploits the 5' -* 3' nucle- 
olytic activity of Taq DNA poly- 
merase (2,3) and Is diagramed in Figure 1. 
The fluorogenic probe consists of an oli- 
gonucleotide with a reporter fluorescent 
dye, such as a fluorescein, attached to 
the 5' end; and a quencher dye, such as a 
rhodamine, attached internally. When 
the fluorescein is excited by irradiation, 
its fluorescent emission will be 
quenched if the rhodamine is close 
enough to be excited through the pro- 
cess of fluorescence energy transfer 
(FET). <4 < 5 > During PCR, if the probe is hy- 
bridized to a template strand, Taq DNA 
polymerase will cleave the probe be- 
cause of its inherent 5' -»■ 3' nucleolytlc 
activity. If the cleavage occurs between 
the fluorescein and rhodamine dyes, it 
causes an increase in fluorescein fluores- 
cence intensity because the fluorescein 
is no longer quenched. The increase in 
fluorescein fluorescence intensity indi- 
cates that the probe-specific PCR product 
has been generated. Thus, FET between a 
reporter dye and a quencher dye is criti- 
cal to the performance of the probe in 
the 5' nuclease PCR assay. 

Quenching is completely dependent 
on the physical proximity of the two 
dyes. (6) Because of this, it has been as- 
sumed that the quencher dye must be 
attached near the 5' end. Surprisingly, 
we have found that attaching a rho- 
damine dye at the 3' end of a probe 
still provides adequate quenching for 
the probe to perform in the 5' nuclease 



PCR assay. Furthermore, cleavage of this 
type of probe is not required to achieve 
some reduction in quenching. Oligonu- 
cleotides with a reporter dye on the 5' 
end and a quencher dye on the 3' end 
exhibit a much higher reporter fluores- 
cence when double-stranded as com- 
pared with single-stranded. This should 
make it possible to use this type of dou- 
ble-labeled probe for homogeneous de- 
tection of nucleic acid hybridization. 



MATERIALS AND METHODS 
Oligonucleotides 

Table 1 shows the nucleotide sequence 
of the oligonucleotides used in this 
study. Linker arm nucleotide (LAN) 
phosphoramidite was obtained from 
Glen Research. The standard DNA phos- 
phoramidites, 6-carboxyfluorescein (6- 
FAM) phosphoramidite, 6-carboxytet- 
ramethylrhodamine succinimidyl ester 
(TAMRA NHS ester), and Phosphalink 
for attaching a 3'-blocking phosphate, 
were obtained from Perkin-FJmer, Ap- 
plied Biosystems Division. Oligonucle- 
otide synthesis was performed using an 
ABI model 394 DNA synthesizer (Applied 
Biosystems). Primer and complement 
oligonucleotides were purified using 
Oligo Purification Cartridges (Applied 
Biosystems). Double-labeled probes were 
synthesized with 6-FAM-labeled phos- 
phoramidite at the 5' end, LAN replacing 
one of the T's in the sequence, and Phos- 
phalink at the 3' end. Following de- 
protection and ethanol precipitation, 
TAMRA NHS ester was coupled to the 
LAN-containing oligonucleotide in 250 
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FIGURE 1 Diagram of S' nuclease assay. Stepwise representaUon of the 5' - 3' ™cleolytic ac- 
tivity " raq DNA polymerase acting on a Autogenic probe during one exten S1 on phase of PCR. 



rriM Na-bicarbonate buffer (pH 9.0) at 
room temperature. Unreacted dye was 
removed by passage over a PD-10 Sepha- 
dex column. Finally, the double-labeled 
probe was purified by preparative high- 
performance liquid chromatography 
(HPLC) using an Aquapore C 8 220x 4.6- 
mm column with 7-jun particle size. The 
column was developed with a 24-min 
linear gradient of 8-20% acetonitrile in 
0.1 m TEAA (triethylarnine acetate). 
Probes are named by designating the se- 
quence from Table 1 and the position of 
the 1AN-TAMRA moiety. For example, 
probe Al-7 has sequence Al with LAN- 
TAMRA at nucleotide position 7 from the 
S' end. 



PCR Systems 

All PCR amplifications were performed 
in the Perkin-Elmer GeneAmp PCR Sys- 
tem 9600 using 50-p-l reactions that con- 
tained 10 dim Tris-HCl (pH 8.3), 50 m M 
KC1, 200 jam dATP, 200 u*c dCTP, 200 p-M 
dGTP, 400 \im dUTP, 0.5 unit of AmpEi- 
ase uracil N-glycosylase (Perkin-Elmer), 
and 1.25 unit of AmpliTaq DNA poly- 
merase (Perkin-Elmer). A 295-bp seg- 
ment from exon 3 of the human (J-actin 



gene (nucleotides 2141-2435 in the se- 
quence of Nakajlma-U|ima et al.) (7) was 
amplified using primers AFP and ARP 
(Table 1), which are modified slightly 
from those of du Breuil et al. (B) Actin am- 
plification reactions contained 4 mM 
MgCl 2 , 20 ng of human genomic DNA, 
50 nM Al or A3 probe, and 300 mn each 



TABLE 1 Sequences of Oligonucleotides 



primer. The thermal regimen was 50°C 
(2 min), 95°C (10 min), 40 cycles of 9S°C 
(20 sec), 60°C (1 min), and hold at 72°C. 
A 515-bp segment was amplified from a 
plasmid that consists of a segment of X 
DNA (nucleotides 32,220-32,747) In- 
serted in the Smal site of vector pUC119. 
These reactions contained 3.5 mM 
MgCl z< 1 ng of plasmid DNA, 50 nM P2 or 
P5 probe, 200 nM primer F119, and 200 
nM primer R119. The thermal regimen 
was 50°C (2 min), 95°C (10 min), 25 cy- 
cles of 95^ (20 sec), 57°C (I min), and 
hold at 72°C. 



Fluorescence Detection 

For each amplification reaction, a 40-p.l 
aliquot of a sample was transferred to an 
individual well of a white, 96-well micro- 
liter plate (Perkin-Elmer). Fluorescence 
was measured on the Perkin-Elmer Taq- 
Man LS-50B System, which consists of a 
luminescence spectrometer with plate 
reader assembly, a 485-nm excitation fil- 
ter, and a 515-nm emission filter. Excita- 
tion was at 488 nm using a 5-nm slit 
width. Emission was measured at 518 
nm for 6-FAM (the reporter or R value) 
and 582 nm forTAMRA (the quencher or 
Q. value) using a 10-nm slit width. To 
determine the increase in reporter emis- 
sion that is caused by cleavage of the 
probe during PCR, three normalizations 
are applied to the raw emission data. 
First, emission intensity of a buffer blank 
is subtracted for each wavelength. Sec- 
ond, emission intensity of the reporter is 



Name 



Type 



Sequence 



ACCCACAGGAACTGATCACCACTC 
ATGTCGCGTTCCGGCTGACGTTGTGC 
TCGCATTACl'GATCGTiGCCAACCAGTp 
GTACTGGTTGGCAACGATCAGTAATGCGATG 
CGGA'lTTGCTGGTATCTATGACAAGGATjp 
TTCATCCTTGTCATAGATACCAGCAAATCCG 

TCACCCACACTGTGCCCATCTACGA 
CAGCGGAACCGCfCATTGCCAATGG 
ATCCCCTCCCCCATCCCATCCTGCGTp 
AGACGCAGGATGGCATGGGGGAGGGCATAC 

CGCCCrGGACTTCGAGCAAGAGATp 
CCATCfcTTGCTCGAAGT CCAGGGCGAC 

for each oligonucleotide used In this study, the nucleic add sequence is 8 lven < 7'" e " ^ 
? J 3> direction. There are three types of oligonucleotides: PCF primer, £ 
to the 5' nudease assay, and complement used to ^f^^^S^iS^^ 
probes, the underlined base Indicates a position where LAN with TAMRA attached was 
tuted for a T. (p) The presence of a 3' phosphate on each probe. 
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A1-2 RAQGCCCTCCCCCATGCCATCCTGCGTp 

A1-7 RatgcccQcccccatgccatcctgcgtp 

A1-14 RATGCCCTCCCCCAQGCCATCCTGCGTp 

A1-19 HatgccctcccccatgccaQcxitgcgtp 

A1-22 RATGCCCTCCCCCATGCCATCCQGCGTp 

A1-26 RatgccctcccccatgccatcctgcgQp 



Probe 


518 


nm 


582 nm 


RQ- 


RQ+ 


ARQ 




no temp. 


*■ temp. 


no temp. 


+ tamp. 








A1-2 


25.5 ±2.1 


32.7 ± 1.9 


38.2 ± 3.0 


38.2 ±2.0 


0.67 ±0.01 


0.86 ±0.06 


0.19 ±0.06 


A1-7 


53.5 + 4.3 


395.1 ±21.4 


108.5 ±6.3 


110.3 ±5.3 


0.49 ±0.03 


3.58±0.17 


3.09 ±0.18 


A1-14 


127.0 ±4.9 


403.5 ±19.1 


108.7 ±55 


93.1 ±6.3 


1.16 ±0.02 


4.34 ±0.15 


3.18±0.15 


A1-19 


187.5 ±17.9 


422.7 ±7.7 


70.3 ±7.4 


73.0 ±2.8 


2.67 ±0.05 


5.80 ±0.15 


3.13 ±0.16 


A1-22 


224.6 ±9.4 


482.2 ±43.6 


100.0 ±4.0 


96.2 ±9.6 


2.25 ±0.03 


5.02 ±0.11 


2.77 ±0.12 


A1-26 


160.2 ±8.9 


454.1 ±18.4 


93.1 ±5.4 


90.7 ±3.2 


1.72 ±0.02 


5.01 ±0.08 


3.29 ±0.08 



FIGURE 2 Results of 5' nuclease assay comparing p-actin probes with TAMRA at different nucle- 
otide positions. As described in Materials and Methods, PCR amplifications containing the in- 
dicated probes were performed, and the fluorescence emission was measured at S18 and 582 nm. 
Reported values ate the average* 1 s.d. for six reactions run without added template (no temp.) 
and six reactions run with template (+temp.). The RQ ratio was calculated for each individual 
reaction and averaged to give the reported RQ" and RQ + values. 



divided by the emission intensity of the 
quencher to give an RQ ratio for each 
reaction tube. This normalizes for well- 
to-well variations in probe concentra- 
tion and fluorescence measurement. Fi- 
nally, ARQ is calculated by subtracting 
the RQ value of the no-template control 
(RQ - ) from the RQ value for the com- 
plete reaction including template 
(RQ + ). 

RESULTS 

A series of probes with increasing dis- 
tances between the fluorescein reporter 
and rhodamine quencher were tested to 
investigate the minimum and maximum 
spacing that would give an acceptable 
performance in the 5' nuclease PCR as- 
say. These probes hybridize to a target 



sequence in the human p-actin gene. 
Figure 2 shows the results of an experi- 
ment in which these probes were in- 
cluded in PCR that amplified a segment 
of the 0-actin gene containing the target 
sequence. Performance in the 5' nu- 
clease PCR assay is monitored by the 
magnitude of ARQ, which is a measure 
of the increase in reporter fluorescence 
caused by PCR amplification of the 
probe target. Probe Al-2 has a ARQ value 
that is close to zero, indicating that the 
probe was not cleaved appreciably dur- 
ing the amplification reaction. This sug- 
gests that with the quencher dye on the 
second nucleotide from the 5' end, there 
is insufficient room for Taq polymerase 
to cleave efficiently between the reporter 
and quencher. The other five probes ex- 
hibited comparable ARQ values that are 



clearly different from zero. Thus, all five 
probes are being cleaved during PCR am- 
plification resulting in a similar increase 
in reporter fluorescence- It should be 
noted that complete digestion of a probe 
produces a much larger increase in re- 
porter fluorescence than that observed 
in Figure 2 (data not shown). Thus, even 
in reactions where amplification occurs, 
the majority of probe molecules remain 
uncleaved. It is mainly for this reason 
that the fluorescence intensity of the 
quencher dye TAMRA changes little with 
amplification of the target. This is what 
allows us to use the 582-nm fluorescence 
reading as a normalization factor. 

The magnitude of RQ" depends 
mainly on the quenching efficiency in- 
herent in the specific structure of the 
probe and the purity of the oligonucle- 
otide. Thus, the larger RQ - values indi- 
cate that probes Al-14, Al-19, Al-22, and 
Al-26 probably have reduced quenching 
as compared with Al-7. Still, the degree 
of quenching is sufficient to detect a 
highly significant increase in reporter 
fluorescence when each of these probes 
is cleaved during PCR. 

To further investigate the ability of 
TAMRA on the 3' end to quench 6-FAM 
on the 5' end, three additional pairs of 
probes were tested in the 5' nuclease 
PCR assay. For each pair, one probe has 
TAMRA attached to an internal nucle- 
otide and the other has TAMRA attached 
to the 3' end nucleotide. The results are 
shown in Table 2. For all three sets, the 
probe with the 3' quencher exhibits a 
ARQ value that is considerably higher 
than for the probe with the internal 
quencher. The RQ" values suggest that 
differences in quenching are not as great 
as those observed with some of the Al 
probes. These results demonstrate that a 
quencher dye on the 3' end of an oligo- 
nucleotide can quench efficiently the 



TABLE 2 Results of 5' Nuclease Assay Comparing Probes with TAMRA Attached to an Internal or 3'-terminai Nucleotide 



518 nm 582 nm 



Probe 


no temp. 


+ temp. 


no temp. 


+ temp. 


RQ" 


RQ + 


ARQ 




A3-6 


54.6 ± 3.2 


84.8 ±3.7 


116.2 ± 6.4 


115.6 s 2.5 


0.47 ± 0.02 


0.73 ± 0.03 


0.26 ± 


0.04 


A3-24 


7Z.1 ± 2.9 


236.5 ± 11.1 


84.2 ± 4.0 


90.2 ± 3.8 


0.86 ± 0.02 


2.62 ± 0.05 


1.76 ± 


0.05 


P2-7 


82.8 ± 4.4 


384.0 ± 34.1 


105.1 ± 6.4 


120.4 ± 10.2 


0.79 ± 0.02 


3.19 + 0.16 


2.40 ± 


0.16 


P2-27 


113.4 ±6.6 


555.4 ± 14.1 


140.7 ± 8.5 


118.7 ± 4.8 


0.81 ± 0.01 


4.68 ± 0.10 


3.88 ± 


0.10 


P5-10 


77.5 ± 6.5 


244.4 ± 15.9 


86.7 i. 4.3 


95.8 ± 6.7 


0.89 ± 0.05 


2.55 ± 0.06 


1.66 ± 


0.08 


P5-28 


64.0 ± 5.2 


333.6 ±12.1 


100.6 ± 6.1 


94.7 ± 6.3 


0.63 ± 0.02 


3.53 =t 0.12 


2.89 ± 


0.13 



p Reactions containing the indicated probes and calculations were performed as described in Material and Methods and in the legend to Fig. 2. 
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fluorescence of a reporter dye on the 5' 
end. The degree of quenching is suffi- 
cient for this type of oligonucleotide to 
be used as a probe in the 5' nuclease PCR 
assay. 

To test the hypothesis that quenching 
by a 3' TAMRA depends on the flexibility 
of the oligonucleotide, fluorescence was 
measured for probes in the single- 
stranded and double-stranded states. Ta- 
ble 3 reports the fluorescence observed 
at 518 and 582 nm. The relative degree 
of quenching is assessed by calculating 
the RQ ratio. For probes with TAMRA 
6-10 nucleotides from the 5' end, there 
is little difference in the RQ values when 
comparing single-stranded with double- 
stranded oligonucleotides. The results 
for probes with TAMRA at the 3' end are 
much different. For these probes, hy- 
bridization to a complementary strand 
causes a dramatic increase in RQ. We 
propose that this loss of quenching is 
caused by the rigid structure of double- 
stranded DNA, which prevents the 5' 
and 3' ends from being in proximity. 

When TAMRA is placed toward the 3' 
end, there is a marked Mg 2 * effect on 
quenching. Figure 3 shows a plot of ob- 
served RQ values for the Al series of 
probes as a function of Mg 2 * concentra- 
tion. With TAMRA attached near the 5' 
end (probe Al -2 or Al-7), the RQ value at 
0 mM Mg 2 * is only slightly higher than 
RQ at 10 mu Mg 2 *. For probes Al-19, 
Al-22, and Al-26, the RQ values at 0 mM 
Mg 2 * are very high, indicating a much 



reduced quenching efficiency. For each 
of these probes, there is a marked de- 
crease in RQ at 1 mM Mg 2 * followed by 
a gradual decline as the Mg 2 * concen- 
tration Increases to 10 mM. Probe Al-14 
shows an intermediate RQ value atO mM 
Mg 2 * with a gradual decline at higher 
Mg 2 * concentrations. In a low-salt en- 
vironment with no Mg 2 * present, a sin- 
gle-stranded oligonucleotide would be 
expected to adopt an extended confor- 
mation because of electrostatic repul- 
sion. The binding of Mg 2 * ions acts to 
shield the negative charge of the phos- 
phate backbone so that the oligonucle- 
otide can adopt conformations where 
the 3' end is close to the S' end. There- 
fore, the observed Mg 2 * effects support 
the notion that quenching of a 5' re- 
porter dye by TAMRA at or near the 3' 
end depends on the flexibility of the oli- 
gonucleotide. 

DISCUSSION 

The striking finding of this study is that 
it seems the rhodamlne dye TAMRA, 
placed at any position in an oligonucle- 
otide, can quench the fluorescent emis- 
sion of a fluorescein (6-FAM) placed at 
the S' end. This Implies that a single- 
stranded, double-labeled oligonucle- 
otide must be able to adopt conforma- 
tions where the TAMRA is close to the 5' 
end. It should be noted that the decay of 
6-FAM in the excited state requires a cer- 
tain amount of time. Therefore, what 



TABLE 3 Comparison of Fluorescence Emissions of Single-stranded and 
Double-stranded Fluorogcnic Probes ' 



518 nm 



582 nm 



RQ 



Probe 



ss 



ds 



ss 



ds 



ss 



ds 



Al-7 

Al-26 

A3-6 

A3-24 

P2-7 

P2-27 

P5-10 

PS-28 



27.75 
43.31 
16.75 
30.05 
35.02 
39.89 
27.34 
33.65 



68.53 
509 J8 

62.88 
578.64 

70.13 
320.47 
144.85 
462.29 



61.08 
53.50 
39.33 
67.72 
S4.63 
6S.10 
61.95 
72.39 



138.18 
93.86 
165.57 
140.25 
121.09 
61.13 
165.54 
104.61 



0.45 
0.81 
0.43 
0.45 
0.64 
0.61 
0.44 
0.46 



0.50 
5.43 
0.38 
3.21 
0.58 
5.25 
0.87 
4.43 



(ss) Single-stranded. The fluorescence emissions at 518 or 582 nm for solutions containing a final 
concentration of 50 riM indicated probe, 10 mm Tris-HCt (pH 8.3), 50 mM KC1, and 10 mM MgCl 2 . 
(ds) Double-stranded. The solutions contained, in addition, 100 nM A1C for probes Al-7 and 
Al-26, 100 nM A3C for probes A3-6 and A3-24, 100 nw P2C for probes P2-7 and P2-27. or 100 dm 
PSC for probes PS-10 and P5-28. Before the addition of MgCI* 120 nl of each sample was heated 
at 9S°C for 5 min. Following the addition of 80 pi of 25 mM MgCl 2 , each sample was allowed to 
cool to room temperature and the fluorescence emissions were measured. Reported values are 
the average of three determinations. 



matters for quenching is not the average 
distance between 6-FAM and TAMRA 
but, rather, how close TAMRA can get to 
6-FAM during the lifetime of the 6-FAM 
excited state. As long as the decay time of 
the excited state Is relatively long com- 
pared with the molecular motions of the 
oligonucleotide, quenching can occur. 
Thus, we propose that TAMRA at the 3' 
end, or any other position, can quench 
6-FAM at the 5' end because TAMRA is in 
proximity to 6-FAM often enough to be 
able to accept energy transfer from an 
excited 6-FAM. 

Details of the fluorescence measure- 
ments remain puzzling. For example. Ta- 
ble 3 shows that hybridization of probes 
Al-26, A3-24, and P5-28 to their comple- 
mentary strands not only causes a large 
Increase in 6-FAM fluorescence at 518 
nm but also causes a modest increase in 
TAMRA fluorescence at 582 nm. If 
TAMRA is being excited by energy trans- 
fer from quenched 6-FAM, then loss of 
quenching attributable to hybridization 
should cause a decrease in the fluores- 
cence emission of TAMRA. The fact that 
the fluorescence emission of TAMRA in- 
creases indicates that the situation is 
more complex. For example, we have an- 
ecdotal evidence that the bases of the 
oligonucleotide, especially G, quench 
the fluorescence of both 6-FAM and 
TAMRA to some degree. When double- 
stranded, base-pairing may reduce the 
ability of the bases to quench. The pri- 
mary factor causing the quenching of 
6-FAM in an intact probe is the TAMRA 
dye. Evidence for the importance of 
TAMRA Is that 6-FAM fluorescence 
remains relatively unchanged when 
probes labeled only with 6-FAM are used 
in the 5' nuclease PCR assay (data not 
shown). Secondary effectors of fluores- 
cence, both before and after cleavage of 
the probe, need to be explored further. 

Regardless of the physical mecha- 
nism, the relative independence of posi- 
tion and quenching greatly simplifies 
the design of probes for the 5' nuclease 
PCR assay. There are three main factors 
that determine the performance of a 
double-labeled fluorescent probe in the 
5' nuclease PCR assay. The first factor is 
the degree of quenching observed in the 
intact probe. This is characterized by the 
value of RQ " , which is the ratio of re- 
porter to quencher fluorescent emis- 
sions for a no template control PCR. In- 
fluences on the value of RQ" include 
the particular reporter and quencher 
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FIGURE 3 Effect of Mg 2+ concentration on RQ ratio for the Al series of probes. The fluorescence 
emission intensity at 5 IS and 582 nm was measured for solutions containing 50 nM probe, 10 mM 
Trls-HQ {pH 8.3), 50 mM KG, and varying amounts (0-10 mM) of Mgd 2 . The calculated RQ 
ratios (518 nm intensity divided by 582 nm intensity) are plotted vs. MgCl 2 concentration (mM 
Mg). The key (upper right) shows the probes examined. 



dyes used, spacing between reporter and 
quencher dyes, nucleotide sequence 
context effects, presence of structure or 
other factors that reduce flexibility of 
the oligonucleotide, and purity of the 
probe. The second factor is the efficiency 
of hybridization, which depends on 
probe T m , presence of secondary struc- 
ture in probe or template, annealing 
temperature, and other reaction condi- 
tions. The third factor is the efficiency at 
which Taq DNA polymerase cleaves the 
bound probe between the reporter and 
quencher dyes. This cleavage is depen- 
dent on sequence complementarity be- 
tween probe and template as shown by 
the observation that mismatches in the 
segment between reporter arid quencher 
dyes drastically reduce the cleavage of 
probe. (1) 

The rise in RQ~ values for the Al se- 
ries of probes seems to indicate that the 
degree of quenching is reduced some- 
what as the quencher is placed toward 
the 3' end. The lowest apparent quench- 
ing is observed for probe Al-19 (see Fig. 
3) rather than for the probe where the 
TAMRA is at the 3' end (Al-26). This is 
understandable, as the conformation of 
the 3' end position would be expected to 
be less restricted than the conformation 
of an internal position. In effect, a 
quencher at the 3' end is freer to adopt 
conformations close to the 5' reporter 
dye than is an internally placed 
quencher. For the other three sets of 



probes, the interpretation of RQ~ values 
is less clear-cut. The A3 probes show the 
same trend as Al, with the 3' TAMRA 
probe having a larger RQ - than the in- 
ternal TAMRA probe. For the P2 pair, 
both probes have about the same RQ~ 
value. For the P5 probes, the RQ - for the 
3' probe is less than for the internally 
labeled probe. Another factor that may 
explain some of the observed variation is 
that purity affects the RQ" value. Al- 
though all probes are HPLC purified, a 
small amount of contamination with 
unquenched reporter can have a large ef- 
fect on RQ". 

Although there may be a modest ef- 
fect on degree of quenching, the posi- 
tion of the quencher apparently can 
have a large effect on the efficiency of 
probe cleavage. The most drastic effect is 
observed with probe Al-2, where place- 
ment of the TAMRA on the second nu- 
cleotide reduces the efficiency of cleav- 
age to almost zero. For the A3, P2, and PS 
probes, ARQ is much greater for the 3' 
TAMRA probes as compared with the in- 
ternal TAMRA probes. This is explained 
most easily by assuming that probes 
with TAMRA at the 3' end are more likely 
to be cleaved between reporter and 
quencher than are probes with TAMRA 
attached internally. For the Al probes, 
the cleavage efficiency of probe Al-7 
must already be quite high, as ARQ does 
not increase when the quencher is 
placed closer to the 3' end. This illus- 



trates the importance of being able to 
use probes with a quencher on the 3' 
end in the 5' nuclease PCR assay. In this 
assay, an increase in the intensity of re- 
porter fluorescence is observed only 
when the probe is cleaved between the 
reporter and quencher dyes. By placing 
the reporter and quencher dyes on the 
opposite ends of an oligonucleotide 
probe, any cleavage that occurs will be 
detected. When the quencher is attached 
to an internal nucleotide, sometimes the 
probe works well (Al-7) and other times 
not so well (A3-6). The relatively poor 
performance of probe A3-6 presumably 
means the probe is being cleaved 3' to 
the quencher rather than between the 
reporter and quencher. Therefore, the 
best chance of having a probe that reli- 
ably detects accumulation of PCR prod- 
uct in the S' nuclease PCR assay is to use 
a probe with the reporter and quencher 
dyes on opposite ends. 

Placing the quencher dye on the 3' 
end may also provide a slight benefit in 
terms of hybridization efficiency. The 
presence of a quencher attached to an 
internal nucleotide might be expected to 
disrupt base-pairing and reduce the T m 
of a probe. In fact, a 2°C-3°C reduction 
in T m has been observed for two probes 
with internally attached TAMRAs. w This 
disruptive effect would be minimized by 
placing the quencher at the 3' end. Thus, 
probes with 3' quenchers might exhibit 
slightly higher hybridization efficiencies 
than probes with internal quenchers. 

The combination of Increased cleav- 
age and hybridization efficiencies means 
that probes with 3' quenchers probably 
will be more tolerant of mismatches be- 
tween probe and target as compared 
with internally labeled probes. This tol- 
erance of mismatches can be advanta- 
geous, as when trying to use a single 
probe to detect PCR-amplified products 
from samples of different species. Abo, it 
means that cleavage of probe during PCR 
is less sensitive to alterations In an- 
nealing temperature or other reaction 
conditions. The one application where 
tolerance of mismatches may be a disad- 
vantage is for allelic discrimination. Lee 
et al. (1> demonstrated that allele-specific 
probes were cleaved between reporter 
and quencher only when hybridized to a 
perfectly complementary target. This al- 
lowed them to distinguish the normal 
human cystic fibrosis allele from the 
AFS08 mutant. Their probes had TAMRA 
attached to the seventh nucleotide from 
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the 5' end and were designed so that any 
mismatches were between the reporter 
and quencher. Increasing the distance 
between reporter and quencher would 
lessen the disruptive effect of mis- 
matches and allow cleavage of the probe 
on the incorrect target. Thus, probes 
with a quencher attached to an internal 
nucleotide may still be useful for allelic 
discrimination. 

In this study loss of quenching upon 
hybridization was used to show that 
quenching by a 3' TAMRA is dependent 
on the flexibility of a single-stranded oli- 
gonucleotide. The increase in reporter 
fluorescence intensity, though, could 
also be used to determine whether hy- 
bridization has occurred or not. Thus, 
oligonucleotides with reporter and 
quencher dyes attached at opposite ends 
should also be useful as hybridization 
probes. The ability to detect hybridiza- 
tion in real time means that these probes 
could be used to measure hybridization 
kinetics. Also, this type of probe could be 
used to develop homogeneous hybrid- 
ization assays for diagnostics or other ap- 
plications. Bagwell et al. (10) describe just 
this type of homogeneous assay where 
hybridization of a probe causes an in- 
crease in fluorescence caused by a loss of 
quenching. However, they utilized a 
complex probe design that requires add- 
ing nucleotides to both ends of the 
probe sequence to form two Imperfect 
hairpins. The results presented here 
demonstrate that the simple addition of 
a reporter dye to one end of an oligonu- 
cleotide and a quencher dye to the other 
end generates a fluorogenic probe that 
can detect hybridization or PCR amplifi- 
cation. 
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We have developed a novel "real time" quantitative PCR method. The method measure PCR product 
accumulation through a dual-landed fluoroeenlc probe (i.c, TaqMan Prob«). This method provides .vary 
accurate and reproducible quantitation of gene copies. Unlike other quantitative PCR methods, real-time PCR 
does nor require post-PCR sample handling, preventing potential PCR product carry-over contamination and 
resulting In much faster and higher throughput assays. The real-time PCR method has a very large dynamic 
ranee of starting target molecule determination (at least five orders of magnitude). Real-time Quantitative 
PCR is extremely accurate and less labor-intensive than ctirrenc quantitative PCR methods. 



Quantitative nucleic acid sequence analysis has 
had an important role in many fields of biologi- 
cal research. Measurement of gents expression 
(RNA) has been used extensively In monitoring 
biological responses lo various stimuli (Tan ft al, 
1994; Huang et al. 1995a,b; Prud'homme et al. 
1995). Quantitative gene analysis (TjNA) has 
Ixfii ujcU tu dttiermine the centime nuanihy of a 
particular gene, as in the case or tlio human HEK2 
gene, which Is amplified in -30% of breast tu- 
mors (Slamon et al. 1987). Gene and genome 
quantitation .(DNA and UNA) also have been used 
for analysis of human immunodeficiency virus 
(IllV) burden clemonst rating changes in the lev- 
els of vl rus throughout the different phases of the 
disease (Connor et al. 1993; Platak et al. jvwh; 
Jurtado et al. 1995). 

Many methods have been described for the 
quantitative analysis of nuclide acid sequences 
(both fur RNA and DNA; South*"! IV rb; Sharp et 
al. 1980; Thomas 1980). Recently, PCR has 
proven to be a powerful tool fcrr quantitative 
nucleic acid analysis. PCR and reverse transcrip- 
tase (KT)-PCR have permitted the analysis of 
minimal starting quantities of nucleic acid (as 
little as one cell equivalent). This has made. pos- 
slblc many experiments that could not have, been 
performed with traditional methods. Although 
PCR has provided a powerful tool, it is imperative 



that It he useU properly r«r quantitation (U»«y- 
maekers 1995). Many early reports of quantita- 
tive PCR and RT-PCR described quantitation of 
the VCR product but did not measure the initial 
target sequence, quantity. It is essential to design 
proper controls for the quantitation of the initial 
target sequences (Perrc 1992; Clcnientl et al. 
100?.) 

Ke.swirchers have, developed several nie.tliods 
of quantitative PCR and RT-PCR. One approach 
measures PCR product quantity in the log phase 
of the reaction before the plateau (Kellogg et al. 
1990; Pang ct al. 1990). This method requires 
that each sample has equal input amounts of 
nucleic add and that each soroplc under analysis 
amplifies with idciii leal efficiency up to the. point 
of quantitative analysis. A gene sequence (con- 
tained in all samples at relatively constant quan- 
titiei, such as p-aelln) tan be us«d for sample, 
amplification efficiency rmrmalization. Uslny 
conventional methods of PCR detection and 
quantitation (gel electrophoresis or plate capture 
hybridization), it is exiremcly laborious to assure 
that all samples tire analyzed during the log phase 
of the reaction (for both the taTgcl gene and the 
normalization gene). Another method, quantita- 
tive competitive (QC)-PCR, has been developed 
and Is used widely for PCR quantitation. QC-PCR 
mlios on the inclusion of an internal control 
competitor in each reaction (Becker-Andre 1991; 
Hatak cl ol. 1093«,1>). The ofllcloncy of t-ach re- 
action ts normalized to tb<= inlcrnol competitor. 
A Wnnvun auttiuiU of lnlejnaJ competitor £an be 
anww 7nce na i aha wj ro:itT rnn7/cn/7T 
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added to each sample. To obtain relative quant- 
ration, the unknown target PGR product is com- 
pared with the known competitor It'-W product. 
.Success of a quantitative competitive I'CU assay 
re.ii«so»i developing an Internal control Hint am- 
jiliru-* Willi trie same efficiency as the target mol- 
ecule. The design of the coiupctltoi and the vali- 
dation of amplification efficiencies icquirc a 
dedicated effort. However, because QCMKIK does 
not require that PC ill pioducls be analyzed during 
the log phase of tlie. amplification, it is the easier 
uf the two methods to use. 

Several detection systems me used for quan 
Illative PCK and RT-I'CU analysis! (3) agarose 
gels, (2) fluorescent labeling of FOR products and 
detection with In.nTr-induced fluorescence using 
capillary electrophoresis (haseo et al. 1995; Wll- 
IMiti5 et al. 1 996) or acrylamlde gels, and (3) j jl;*ttr 
capture, and sandwich probe hybrid l/.al Inn (Mul- 
der el al. "1994). Although these methods pmved 
successful, each method requires post-1'C.R ma- 
nipulations that add time to the analysts and 
may lead lo hibu'utuiy t oiilnuiination. The 
sample throughput uf Ibese methods is limited 
(wlljl |hc exception of the plate capture ap- 
proiieli). mihI, therefore, these methods, ore not 
well suited fm u>is demanding high sample 
Throughput (i.e., screening of large numbers of 

hlo(iiwIei.ul» id aiidtyv.lll^ SAmplcs K« diagnu:>- 
tlcs or clinical trials). 

Here we report tlu: development of a novel 
iixsay for quantitative DNA analysis. The assay is 
lwsed on the uje iif the ,V ' nuclease assay first 
described by Holland et al. (1991), 'Die method 
(ism the 5' nuclease. activity of Tatf polymerase to 
cleave a noncxtcndlblc hybridization probe dur- 
ing the extension phase of I'CK. The. approach 
uses dual-labeled fluorogcnic hybridisation 
probes (Lcc ct a). 1993; Itosslcr ct al. 1993; l.lvals 
el ill. l9$5a,b). One. fluorescent dye serves «s a 
reporter |PAM (i.e., 6-carboxyfluore»ecin)| rind iis 
emission spectra is quenched by the second fluo- 
rescent dye, TAMRA (I.e., (j-carboxy-ietramethyl- 
ihodaminc). The nuclease degradation of the hy- 
brtdixiitlon probe releases Ihc quenching of Hie 
I'AM fluorescent emission, resulting in an In- 
crease in peak fluorescent emission at SIB nm. 
'i'lie use of a sequence detector (AM 1'iism) allows 
mcasurcrneni of fluorescent spectra of all 96 wells 
Lif the. thermal eyelet continuously during the 
1"CK amplification. Therefore, the relictions aic 
monitored in real lime. The output data is de- 
scribed and quantitative analysis of input target 
DNA sequences IS discussed below. 



UEAL 1IML OUANIIlAhVI ViM 

RESULTS 

PCR Produce Detection in R«al Time 

The goal was to develop a high-throughput, sen- 
sitive, and accurate gene quantitation assay for 
use In monitoring lipid mediated therapeutic 
gene delivery. A plasinld encoding human factor 
Vlil gene sequence, pI'8TM (sec Methods), was 
used as a model therapeutic gene. The asswy tisr-<s 
fluorescent Taqinan methodology and an instru- 
ment capable of measuring fluorescence in real 
time (AB1 Prism. 7700 Sequence Detector). The 
Taqimm reaction requires a hybridisation jwnhr 
lulxled with two different fluorescent dyes. One 
dye Is a reporter dyu (I'AM), ibe other is i quencb- 
i,, H dye (TAMRA). When tlie pn»U: Is inlacl, fluo- 
icstent energy transfer occurs and the reporter 
dye fluorescent emission is absorbed by the 
quenching dye (TAKfRA). During Die extension 
phase of the 1'CK cycle, the fluorescent liybrid- 
l/willoii probe Is cleaved by tlie S'-'V nuclcolylic 
activity of the DNA polymerase. On cleavage of 
the probe, the reporter dye emission is no longer 
transferred efficiently to the quenching dye, re 
sultinK In un increase of the reporter ilyu fluores- 
cent eiiii.-wlwii »}>«Ctra. I'CU primers and probus 
were designeil foi lliii liuuian fiielor V11J se- 
quence and human p-ae.tln gene (a* described in 
Methods). Optimization reactions were per- 
formed to choose the appropriate probe and 
magncsluni concentrations yielding U>e hi^u-si 
Intensity of rt-|>oricr fluorescent signal without 
stierlflcing specificity. The Instrumenl uses a 
charge-cntiplcd device (i.e., CCD eamcrii) for 
measuring the fluorescent emission spectni from 
. r ,(X) to C$0 nm. liach VCR lube was monitored 
sequentially for 2!> msec with continuous moni- 
toring thrvughotlt tlie ttiii|ilificittK>n. liach lube 
was re-exanilncd every 8.S sec. Cc»niputcr x>fl- 
wur« was designed 1o examine the fluorescent In- 
tensity of both the r«]>cirter dye (I-AM) and 
the quenching dye (TAMRA). The lluorescciU 
intensity of the quenching dye, TAMRA, cliungvs 
very Utile, over the course of the PCR amplifi- 
cation (data not shown). Therefore, the Intensity 
of TAM11A dye t-mlssJon serves as an Internal 
standard with Which to nornmlbx! the reporter 
dye (l-'AM) emission variations. The software cal- 
culates a vdhn: termed ARn (or ARQ) usinR the 
following equation: ARn - (Iln J ) (R"'), where 
Un 4 . emission iiilensily »f reporter/emission In- 
tensity of quencher at any given time In a retie 
tlon lube, and Ru -emission intensilily of TC- 
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poncr/cmlsstoil Intensity wf quencher measured 
prior 10 l'CK aniplilication in ihar same reaction 
lube. 1'or the purpose of quantitation, the last 
three data jjoints (ARns) collected during the. ex* 
tension step for each I'CR cycle were analyzed. 
The micleolytic degradation of the ityuiidiy^tion 
probe occurs during the cxtcnwun phase or rot, 
and, therefore, reporter fluurcscent emission in- 
creases during ihis time. Jin: iluee data points 
were averaged for cadi KJk cycle and the mean 
value for each was plotted in an "amplification 
plot" shown In I'l^urc JA. The AKn mean value is 
plotted on the j^axis, and time, rcpreseuted by 
cycle number, is plotted on the *-axis. During the 
early cycles of the VCR amplification, the AUn 



value remains at base line when sufficient hy- 
bridiy-allon probe h.'iS been cleaved by tiie Tilt] 
polymerase nuclcASe activity, the intensity of to. 
porter fluorescent emission Inereuset-. Must I'Ott 
amplifiv*lions reach » plateau phiim- of reporter 
fJuurutevst'il emission if the reHelitui Is carried nut 
to high cycle ijuiiiIwin. The amplifirallon plot I'J 
cxaiiiimxl euily in thw reaction, ut a point that 
■(.-presents the log phase of product arnnriula* 
lion. This is done by ussignlng an aibihary 
threshold thai is bused on the variability of the 
baseline dai*. Ut Wgure 1 A, the threshold was set 
at It) standard deviations above, the mean of 
base line emission calculated frntn tydo 1 lo 1 5. 
Once the threshold is chosen, the point at which 
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Figure \ PCR product detection in real time (A) The Model 7700 ^uttware wilt construct amplification plots 
from the extension phase fluorescent emission data collected during the PCR amplification. The standard de- 
viation is determined from the data points collected from the base line of the amplification plot C. , values are 
calculated by determining the point at which the fluorescence exceeds a threshold limit (usually 10 times the 
standard deviation of the base line). (B) Overlay ot amplification plots of serially (1:2) diluted human 9enomic 
DNA somolcs amplified with B-actin primers. (Q Input DNA concentration of the samples plotted versus t.-r- All 
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the amplification plot cro&sua the thrctihold'ivcle 
fined as G,. C r is reported a* the cycle number 
tills point. As will be clomonst rutud, Ihtt CI, .value 
Is jjievlit-Ove of ihc quantity of input t'trgel. 

Cj Values Provide a Quantitative Measurement- of 
Input Targer Sequences 

Figure IB shows amplification plots of lii'dir'fc*- 
enl TCR amplifications overlaid. 'Hie amplifcu- 
Hons wore performed on a 1:2 serial dllutlow ■id 
human genomic UNA. The amplified target w:t». 
human p actln. The amplification plots shift to 
the right (to higher threshold cycles) n.i the inj>ul 
target quantity is reduced, This is expected he- 
cautm riwettoriK with fewer starting copias of tlio 
largci molecule require greater amplification to 
degrade enough probe to attain the Threshold 
fluorescence. An arbitrary threshold of 10 stan- 
dard deviations above the base line was used to 
determine the (',,• values. Figure 1C represents the 
Q. values plotted versus i he sample dilution 
value, Each dilution was amplified in triplicate 
I j c:r amplifications and plotted as mean values 
with error bars representing one standard devia- 
tion. The Cr values decrease linearly wjth increas- 
ing target quantity. Thus, C r values can be used 
as « quontitaTive measurement of t lie input target 
number. It should be noted that the amplifica- 
tion plot for the 15.6-ng sample shown In Plgurc 
1H does not reflect the same fluorescent rate of 
Increase exhibited by most of the other samples. 
Hie 15.6-ng sample also achieves endpoiM pla- 
teau at a lower fluorescent value than would be 
expected based on the Input DNA. This phenoni. 
cnon has been observed occasionally with other 
samples (data not shown) and may be attribut- 
able to late cycle inhibition; this hypothesis is 
still under investigation. It is important to note 
thai the flattened slope and early pJatcau do not 
impact significantly the calculated C, value us 
demonstrated by the fll on the line shown in 
Figure 1C. All triplicate amplifications resulted in 
very similar Q- values— the standard deviation 
did not exceed 0.5 for any dilution. This experi- 
ment contains a >loo,000-fold range of Input tar- 
get molecules. Using C v values for quantitation 
permits a much larger assay range than directly 
using total fluorescent emission intensity for 
quantitation. The linear range ol fluorcsccnl in- 
tensity measurement of ihc Alii Prism 7700 Se- 



7.7M (3l 



mt'iits over n very large r;mj;<' of rpladve Martlnr; 
target quantities. 

Sample. Preparation Validation 

Several parameters Influence the efficiency nf 
PCM amplification: magnesium and salt conceit: 
trutloiis, reaction conditions (i.e., time and tem- 
perature), I'CH target size and composition, 
primer sequences, and sample puriry. All of tlie 
above (actors are common to a single I'CR assay, 
except sample to sample purity, in an effort to 
validate Ihe. method of sample preparation for 
the factor Vlll assay, PCK ampliticfition rcprodno 
ibility and ciflclency ol 10 replicate sample 
prei mirations were examined. After genomic ONA 
was prepared from the 10 replicate samples, the 
DNA was quaiitltaicd by ult/avlolcl spectroscopy, 
Amplillcatlons were performed analyzing p-aciln 
gene content in 100 and 25 ng of totat genomic 
UNA. Each l'CR amplification was pcrfomied in 
triplicate. Comparison of C r values for each trip, 
llcate sample show minimal variation based on 
standard deviation and coefficient of variance 
(Tabic 1). Therefore, each ol the triplicate PCIl 
amplifications was highly reproducible, demon- 
strating that real time I'CK using this instrumen- 
tation introduces minimal variaiion Into the 
quantitative. J'CK analysis. Comparison of the 
mean Q volucs of the JO replicate sample prepa- 
rations also showed minimal variability, indicat- 
ing that each sample preparation yielded sbnilar 
results for f-t-actln gene quantity. The highest C T 
difference between any of rhe samples was 0.S5 
and 0.73 for the 1(H) and 25 ng samples, respec- 
tively. Additionally, the amplification of each 
sample, exhibited an equivalent rate of fluores- 
cent emission intensity change per amount of 
DNA target analyzed as indicated by similar 
slopes derived from Ihc sample dilutions (Pig. 2). 
Any sample containing an excess of a 1'Clt inhibi- 
tor would exhibit a greater measured 3-actln O r 
value for a given quantity of DNA. In addition, 
the inhibitor would be diluted along with the 
.sample in the dilution analysis (1-ig. 2), altering 
the expected C,- value change. Rach sample am- 
plification yielded a similar result in the analysis, 
demonstrating that Uiis method of sample prepa- 
ration is highly reproducible, wllh regard lo 
sample purity. 

Quantitative Analysis of a Plasmid After 

ynci? no i r*r wj «o:i,t 7nn7/cn/7T 
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Tablo 1 . Reproducibility of Samplo Preparation Method 



9 
10 

Mean 



100 ng 



Sampla 

no. C T 



standard 
rn6an deviation 



CV 



18.2-1 

18.23 

10.33 

18.33 

18.35 

18.44 

18.3 

18.3 

18.42 

18.15 

18.23 

18.32 

18.4 

18.38 

18.46 

18.54 

18.67 

19 

18.28 

18.36 

18.52 

18.45 

1B.7 

18.73 

18.18 

18.34 

18.36 

18.42. 

18.57 

1 8.66 

0 io) 



1«.27 



18.42 



18.39 



18.55 
18.12 



0.06 



0.0(5 



18.34 0.07 



18.23 0.08 



0.04 



18.74 0.24 



0.12 



18.63 0.16 



18.29 0.1 



0.12 
0.17 



0.32 

03? 

0.36 

0.46 

0.23 

1.26 

0-66 

0.83 

0.55 

0.6S 
0,90 



25 ng 



20.46 

20.55 

20,5 

20.61 

20.59 

70.41 

20.54 

20.6 

20.49 

20.48 

20.44 

20.38 

20.68 

20.87 

20.63 

21.09 

21.04 

21.04 

20.67 

20.73 

20.65 

20.96 

20.84 

20.75 

20.46 

20.54 

20.48 

20.79 

20.78 

20.62 



standard 
mean deviation • CV 



20.51 0.03 0,17 

70.54 0.11 0.54 

20.54 0.06 0,26 

20.43 0.05 0.26 

20.73 0.13 0.61 

21.06 0.03 0.15 

20.68 0.04 0.2 

20.86 0.12 0.57 

20.51 0.07 0.32 

20.73 0.1 0.-16 

20.66 0.19 0.94 



(or containing a partial cDNA for human factor 
VIII, pl-"8TM- A scries of tr.'tilsfcclions was sot 
up using a decreasing amount of ihc plasinid^O, 
A, 0.5, and 0.1 u,g). Twrilly-four hours post- 
transfertion, total DNA wps purified ftxjtn each 
flask uf cells. p-Avlin gene quantity wai chosci i us 
a value for noi'maIi/-.alinn of Kcuoiriii'. ONA con- 
a:nrraUon fruin trttcli sample, hi this cxpeiimcnt, 
|i-actin gene content should remain constant 
relative to total genomic UNA. Figure 3 shows lljc 
result of the p-actln JDNA measurement (100 jig 
total DNA dclcrniined by ultraviolet spectros- 
copy) Ot each sample. Kach Simple was analyzed 
in triplicate and the mean |i-actin Cj values of 
the triplicates were plotted (error bars reprcscm 

rt-»-..li«fii rtu^-iaimni 'I Up hiPlifST «1iffrrenrr 



bvtvwwn any lw(> saiiijilct moans wax 0.1IS C,, Ten 
nanograms of total DNA of «aeh sample were also 
examined for D-actln. The results iigidn showed 
that very similar amounts of genomic DNA were: 
present; the maximum mean (1 setin C;, value 
difference waa 1 .0. As l'igurc 3 shows, the rale of 
fl-actln C v ullUMKv between the 100 and 10-ng 
sample.* was similar (slope values ryng« Hutwoon 
3.56 anU - 3.45). This verifies again that the 
method of sample preparation yields samples of 
identical PC.R integrity (i.e., «o sample contained 
an excessive amount of a PCK inhibitor). ITow. 
ever, these results Indicate thutesiel) sample con- 
tained slight diffciencex in the actual amount of 
genomic 1>NA aualy/cd. Determination of actual 
tiuuoijiic t>N A concent ration was accomplished 
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Figure 2 Sample preparation purity. 1 he replicato 
samples shown In Tabic 1 wore also amplified In 
tripicate vising 2S ng ol each DNA sample. The fig- 
uife shows the input DNA concentration (TOO and 
25 ng) vs. C, In the figure. ih<» ion and PS ng 
points for each sample are connected by a line. 



by plotting the mean £-actin C, value obtained 
for nueh UK) lit; stunplv im a p-aclln standard 
i.-uive (shown In J'Sg. 4C1). The actual genomic 
DNA coiicc-iitratlmi «>f each sirmpU:, ti, was ob 
taincd by extrapolation t<> tllu rf.axis, 

Figure 4A shows the measured (f.u., nun- 
normalised) quantities of factor VIJ) pla.ttnUl 
DNA (pJfSTM) from each' of tlu: four transient cell 
lri»'i,"ift!Ctions. Each reaction contained 100 ng of 
total sample DNA (as determined by UV spectro*- 
copy). ivach sample was ui>alyze<J in triplicate 
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Figure 3 Analysis of tidiisfectcd cell DNA quantity 
and purity. I he DNA preparations of the- lour 293 
cell transections (40, A, 0.5, and 0,1 jo.g of pF8TM) 
were analysed for the (i-actln gems. 1 00 and 1 0 ng 
(determined by ultraviolet spectroscopy) of each 
sample were amplified in triplicate. For each 
amount of pF8TM that was transfected, the (3-actln 
C T values are plotted versus the total Input DNA 



pc:r< iunpllfication.t. As shown, pl-'BTM purified 
,hxnp Jbc 29H cells decrease. 1 ; (mean C, values; in- 
CTvevtA) with decreasing amounts of plasuiid 
,lrnil>ii'Ued. The mean C L values obtained for 
prWM inTljurc 4A were plotted cm a standard 
curve C«'rnprl.H««J uf scjlally diluted pKBTM, 
shown in figure 4B. The quuiilily uJ pl-'KTM, ti, 
found in each of the four trnnsfcctlonK was do. 
termined by extrapolation to thu * axh; of tlio 
standard curve In 1'igurc 41*. Thwio uncorrected 
values, b, for pKKTM were iioruiMllyAid to deter- 
mine the actual amount of pl'8TM found pur 100 
riK of genomic DNA by using the equation:. 

/> x 10 0 lift actual pITTITvl copies per 
a ~ 100 ng of genomic UNA 



where a- actual -genomic DNA in a sample and 
b <- pl : 8TM copies from the standard curve. The 
normalised quantity of p['8TM per 100 ng of ge- 
nomic DNA for each of the four Iranafccilona Is 
Jfiown In Figure 4JJ. 'Hir-Nt: results .Micrw mat llic 
quantity of factor Vlll plasiuid associated wiili 
the 29.1 cells, 24 hr after lruji.sfec.li<in. clei.i ease;, 
with decreasing uIhmiiiiI i.iiiH.«iuiatiou used hi 
the transfection. T"Ik: quantity of pl-'tt'i'M associ- 
ated with 293 cells, after irunsfccilon with 40 jig 
of piasmid, was 35 pg per 100 ng genomic UNA. 
Tliis results in -520 plasiuid copies per cell. 



DISCUSSION 

Wo have described a new method for quantitm- 
iug gene copy numbers using Msl-tlmc Analysis 
of PCK amplificatlnnx. ReaHime PCK is compat- 
ible with cither of the two PGR (KT-PC.R) ap- 
proaches; (1 ) quantllcitive comiJctitivt where an 
IiilLiiiiil coiiipclltor for each tarset sequence is 
used for normaliisadon (data not shown) or (2) 
quantitative comparative PCK using w iiKiiiiali^ti- 
tioji j-eue contained within the sample (i.e., |3-ac- 
tin) or a "housekeeping" gene for RT-PCK. If 
equal amounts of tiuclclc add are analyzed for 
eacn satuplc and if the amplification efficiency 
before quantitative analysis m identical for e«<:li 
sample, the rnrernul contml (nui-miili^aliou gene 
or competitor) should Rlvc equal signals for al) 
samples. 

The real-time PCW method (jffcrs sevenil ad- 
vantages over the other two methods currently 
employed (see the Introduction). First, the real- 
time PCR method is performed in » dosed-tubu 
system and requires no post-PCR manipulation 

•7 r\ r* o nA, o *• a xru.r rtn'^T TrtnT,r»o/7T 
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Figure 4 Quantitative- flnalysi* of pF8TM in transfcclxd cclli. (A) Amount of 
plasmid DNA used for I he Uninfect Ion plotted against Hie invtin C, value deter- 
minfld for p/"8TM remaining fa for a (ler f rensiection. (B,Q Standard curves of 
pERTM and fl-acUn, respectively. p/"8TM DNA (fl) and genomic. DNA (Q were 
diluted Sftrlally 1;S before amplification with' the appropriate primers. The p-actin 
standard curve wav usod to norma li>e the results of A to 1 00 nrj of genomic DNA. 
(0) Tho amount of pF8TM present per 100 ng of genomic DNA. 



of sample. Therefore, tin- potential for PCR con- 
tamination in the laboratory is reduced because 
amplified products cun !»• analysed and disponed 
of without opening the ruMetion tubes. Second, 
(his method suppotU the u:«i of a tionir,iliy.iilh>n 
«eno (i.e., p-actin) for quantitative. PGR or house- 
keeping genes for quantitative RT-1'CK controls. 
Analysis Is performed in real time during the Jog 
phase of product accumulation. Analysis during 
h>K phase permits many different genes (over a 
wide input target range) to be analysed slmuha- 
iw-ously, without concern of reaching reaction 
plaicnu at different cycles. This will mukc tnulll- 
£cn<s analysis assays much casle.i Iv develop, bc- 
cnusc individual internal ciiiiiuetliois will mil he 
needed for each gene under analysis. Tliird, 
.-.ample throughput will hu.iea>c ditmiaticdllv 
with the new method because, there is no post- 
PCR piocc-islng time. Additionally, winking In a 
W-well formal is highly compatible with auto, 
malion technology, 

The real-lime PGR method is. highly repro- 
ducible. Replicate amplifications can be analysed 



for tfach sample minimizing potential error. The. 
system allows u>r a very large assay dynamic 
runge (approaching 1,000,000 -fold starting till- 
gel). Uoln^ u Mtandord curve for the. target oJ in- 
terest, relative copy number values can be deter- 
mined for any unknown .-.ample. fluorescent 
threshold values, C r , cont-Jair. linearly with rela- 
tive DNA copy numbers. Ueal time quanlliailvti 
HT-PCH methodology (Cilbwri et al., this l.«ut«) 
has alio been developed, family, real lime quan- 
titative F'Cll methodology can be used to develop 
high-throughput screening aa.iay» for a variety of 
applications [quantitative gene dpjeaaiuit (RT- 
rCR), R«nc copy na.iay.-i (11cr2, 111V, etc.), geno- 
typlng (knockout moiifc. analysis), and Immuuo- 

pcuj. • 

Real-time POl may al.w Ik: performed using 
intercalating dyes (Higtichi cf ul. IfWX) such as 
ciJiJdium bromide. The fluorogenic probe, 
method offers a major advantage over inter- 
calating dyes-- greater specificity (i.e., primer 
dlmers and nonsp«Hflc POR products are not de.- 
tftried). 
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METHODS 

Generation of <i Piasmld Containing a Partial 
cDNA (or Human Factor Ylll 

Total RNA v»«» harvested (UNAwl It (mm Tel Test, Inc., 
rT>cndswood, TX) from inll> liniafeclcd a faelur Vltl 
expression visctor, pCIS2_Bv?.SIJ (Koion el III.' 1VK6; Gor. 
man el al. 1900). A factor VIII partial cMN* mhjiii'ihv WAS 
K«n«.r»«od by in - l'f:lt |C>-iiMmp \>'J. 1'1'lh ItNA K'.U Xlt 
(pan NBOK-Ol/H, rt Applied Uiosysicms, I'ostvi City, <-iA)J 
using tin: I'CK primers IVfor wiul 1-Hrev (|>rinwr sequences 
arc shown btflow). 'Hie ampllcon was reamplified uSlnR 
nKHlifK-U I'flfof and Wrcv primers (apix-uiK-d wuh to/mill 
and Hfudlll restriction Sire sequences »» the V «ulj u»»rt 
Clonal Into jXi)'.M- 3Z (I'tohii-jjh Gorp.. Mutluoii, WI)- The 
resulting clone, pPSTM, was used tor transient transfectlon 
of colli. 

Amplification of Target DNA ami Dulecllon of 
Amplicon Factor VIII Plasmid DNA 

(pl-'HTM) was amplified wllh tlnr j»1hivi» FBfor W <'XX'.- 
«T(l<X^\At»AU:iXJA»»iiCiTi.".-3' and PSrev .V-AAACCT- 
t^OCXri'OGA'IXitjTACiCI-.T.Ilic rutivlkui pioduerd « -l?.:!. 
up it':K product. Tin' forwiird primer ivus designed tu to.- 
ngnlxc it unique Mijiivinv found In (lie .V untrantlatcd 
region <>f Ihe paiejil tKJliZ.Si.Z5l) ploanml mill therefore 
does nut ivuiikiiUu «"«a amplify the human f*i«.-tt>r VIII 
gene. frimnrR woro chosen with the jivsistaure of I he com- 
puter program Olis" 1>" (Nutiim-.il lliuscionees, Inc., I'ly. 
mouth, MN). Thcluiman p-actln gene was amplified with 
the oruncrj U-m-ti" forward primer TCACCCACAtrriri' 
GCCCATl7rAGGA-3' and p-aclin reverse pjimcr V.( '.AC. 

CGGAACCX:(n*(:ArK;c:r-4Ai'GG-3'. The reaction pro- 
aueco .1 295- hp t'Ot product. 

Amplification reactions (SO (J) cowainoil it DNA 
sample, Klx PGR Huffe.r II (.S 200 p.M (MIT, dCTP, 
dG'IT, and 400 p.u dllTP, 4 inx< M^f :i ? , 1.25 Units Ampll 
7ii<; TJNA poiymciasc, 0,5 unit Amperage uracil /V-fiiy- 

«a>.iylu»<.' (UNO), 50 pinole of each fneloi VUI jirlini.'i, und 15 
ptuute of uuch |t actio pi liner. 'Hut teaetlniM iiImi t:<>nltttucd 
Otlt? of the following detection probes (UKJ nu nifh); 

l'8iir«.be V(VAM)Ac:cri , <ri'(:c.A(i<vv<ic:ni:vr\ , i:rvr. 

GCCTT(TAMRA)p 3' «ud p-actin urobe 5' (TAM)ATGf.KlCl- 
X(TAMKA)CCCCCATGCCATG|>-.T where p indicates 
phmphnrylAliAn nnd X indicates a linker arm nucleotide. 
Rcnclioji Mx.\i wtri- Mi<:n>An\p Oi>tii-Bl TuIks (port ftUlll- 
I.ht NKOI 09.1.1, l'crldn Ulnwi) that wore frostixl (sit IVrliln 
Diner) to prvwnl Utjlil from /cflcctliif, Tube eap> were 
slinil.ii' in MioroAiVip l^npa but specially dt.iifiiicd lo pre- 
vent Ug}n scattering. All ol tli<- l'<;it vU/m«wii>IiI>U.« woru mjv 
pliud l/y Pli Applied Hio«y»teni» (I'osler flKy, OA.) except 
the factor Vlll prliuen, wlileli wen- ,tyntliesb.rd «l Ceiivit 
lech, Inc. (Stmt)i '«n rruuclseo, CA). Probes v»w dc*ljin«! 
using the Oligr.i 4.0 software, folk'Wlnf; guidelines am- 

SCSIpci in me Model 7 700 .Sequence IK'Uk uk lii.ntiiim-itl 
manual. Itrlcny, prube T m JiiiuUl lie Al least J U C l)lftti«-r 
mail tli* aiuu'ttlliiit leiii(.ivMlurc u.ied durluj; llirnnul ey- 
rling; primers slt<J\ild iitrt fyim »t.ililv duplexes' willi the 
probe. 

The Uicrnifll i-yrllng conditKins Ineluded 1 mlii ftt 
StrC and 10 niin al 9S"C. 'Iliejiiial cycling procer<ird with 



rc actions wote performed in the Model 77()O.Scciucncc W- 
U-<1<>r (I'U AppHl'd UlusyKteinv), ixhlrh cuntahiv < Cc<« - 
Ani|> IK'ai Syslvnl u«X>. ktattlon wuditiO"^ wcrt- pin- 
graililltvxl on a I'uwor Mjiciiiti»li V100 (Apple f.j.«n| into r, 

Santa Clara, c:a) linKcd dirvttly to the Model V'/OO .V- 
cjueiKV IXclector. An»'y.* u * ,f w * v ai ' u -' l""'^""'"'^ 
tliv M«lnt<«h eompvder. f.nllnell(>ii and analyde software 
win Ocvelri|ied xt i"K Appllvt,! Watynlums. 



Ti'Jiufection of Cells with Factor Vlll Coiutrutl 

J'nur T17S flasks of 293 cells (ATCO CR\. 1573), a human 
fetal kidney tittipension cell line, were h"iwii lo 80% con- 
lluem-y and t»n<fcct«d plWITH. Cells WW Brown in the 
following media; S0% HAM'S K12 without GUT, 50% lt»«i 
jlucose HuJIh-tw's mtjdlflrd B«le medium (I3MHM) with- 
otn glytint: with aodlum bi«riM»iate, 10% total iKivine 
serum, 2 iiim t-nlulainliK, and 1% penicillm-strcptomy- 

lln. The media wbj disnffcd 30 min befo«- Ihe Iroiulcc 
lion. pl : UTM DNA Amounts of 40, 4, OS, ,ind 0.1 m; were 
iiditctl in 1..S ml of a soltitluii containing 0,125 m CiiO,: 
and 1 X Ill'J'KS. The four mixtures were left at rt>oin teiu- 
|.ie.n<t<ir«. ftn It) mln and thetl iiiUlinl HnipwLv to die cells. 
'11 iv n»»k» ^vi«-,i>it.ijljuted at 37°C and 5<K, Ci\ for 24 hr, 
washed with I'ltS, ,iuil «wuspcnded In PBS. The K'hiim 
jn-itd^l cells were divided into »lit|uot.i und UNA wn4 ex- 
tracted Iniinedlutcly iniiiK (hvUIAump RIimkI Kit (Qlafien. 
<amt.im>rtl), CA), UNA wii.s t:liil<-d Into 200 ol 30 i«u 
Trls-llCl ul pll «.(>, 
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ABSTRACT Wnt family members are critical to many 
developmental processes, and components of the Wnt signal- 
ing pathway have been linked to tumorigenesis in familial and 
sporadic colon carcinomas. Here we report the identification 
of two genes, WlSP-1 and W1SP-2, that are up-regulated in the 
mouse mammary epithelial cell line C57MG transformed by 
Wnt-1, but not by Wnt-4. Together with a third related gene, 
WISPS, these proteins define a subfamily of the connective 
tissue growth factor family. Two distinct systems demon- 
strated WISP induction to be associated with the expression of 
Wnt-1. These included (i) C57MG cells infected with a Wnt-1 
retroviral vector or expressing Wnt-1 under the control of a 
tetracylihe repressible promoter, and («) Wnt-1 transgenic 
mice. The WISP-1 gene was localized to human chromosome 
8q24.1-8q24.3. WISP-1 genomic DNA was amplified in colon 
cancer cell lines and in human colon tumors and its RNA 
overexpressed (2- to >30-fold) in 84% of the tumors examined 
compared with patient-matched normal mucosa. WISPS 
mapped to chromosome 6q22-6q23 and also was overex- 
pressed (4- to >40-fold) in 63% of the colon tumors analyzed. 
In contrast, WISPS mapped to human chromosome 20ql2- 
20ql3 and its DNA was amplified, but RNA expression was 
reduced (2- to >30-fold) in 79% of the tumors. These results 
suggest that the WISP genes may be downstream of Wnt-1 
signaling and that aberrant levels of WISP expression in colon 
cancer may play a role in colon tumorigenesis. 



Wnt-1 is a member of an expanding family of cysteine-rich, 
glycosylated signaling proteins that mediate diverse develop- 
mental processes such as the control of cell proliferation, 
adhesion, cell polarity, and the establishment of cell fates (1, 
2). Wnt-1 originally was identified as an oncogene activated by 
the insertion of mouse mammary tumor virus in virus-induced 
mammary adenocarcinomas (3, 4). Although Wnt-1 is not 
expressed in the normal mammary gland, expression of Wnt-1 
in transgenic mice causes mammary tumors (5). 

In mammalian cells, Wnt family members initiate signaling 
by binding to the seven-transmembrane spanning Frizzled 
receptors and recruiting the cytoplasmic protein Dishevelled 
(Dsh) to the cell membrane (1, 2, 6). Dsh then inhibits the 
kinase activity of the normally constitutively active glycogen 
synthase kinase-3/3 (GSK-3/3) resulting in an increase in 
f$-catenin levels. Stabilized /3-catenin interacts with the tran- 
scription factor TCF/Lefl, forming a complex that appears in 
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the nucleus and- binds TCF/Lefl target DNA elements to 
activate transcription (7, 8). Other experiments suggest that 
the adenomatous polyposis coli (APC) tumor suppressor gene 
also plays an important role in Wnt signaling by regulating 
/3-catenin levels (9). APC is phosphorylated by GSK-3/3, binds 
to /3-catenin, and facilitates its degradation. Mutations in 
either APC or /3-catenin have been associated with colon 
carcinomas and melanomas, suggesting these mutations con- 
tribute to the development of these types of cancer, implicating 
the Wnt pathway in tumorigenesis (1). 

Although much has been learned about the Wnt signaling 
pathway over the past several years, only a few of the tran- 
scriptionally activated downstream components activated by 
Wnt have been characterized. Those that have been described 
cannot account for all of the diverse functions attributed to 
Wnt signaling. Among the candidate Wnt target genes are 
those encoding the nodal-related 3 gene, A>iri, a member of 
the transforming growth factor (TGF)-/3 superfamily, and the 
homeobox genes, engrailed, goosecoid, twin (Xtwn), and siamois 
(2). A recent report also identifies c-mvc as a target gene of the 
Wnt signaling pathway (10). 

To identify additional downstream genes in the Wnt signal- 
ing pathway that are relevant to the transformed cell pheno- 
type, we used a PCR-based cDNA subtraction strategy, sup- 
pression subtractive hybridization (SSH) (11), using RNA 
isolated from C57MG mouse mammary epithelial cells and 
C57MG cells stably transformed by a Wnt-1 retrovirus. Over- 
expression of Wnt-1 in this cell line is sufficient to induce a 
partially transformed phenotype, characterized by elongated 
and refractile cells that lose contact inhibition and form a 
multilayered array (12, 13). We reasoned that genes differen- 
tially expressed between these two cell lines might contribute 
to the transformed phenotype. 

In this paper, we describe the cloning and characterization 
of two genes up-regulated in Wnt-1 transformed cells, WISP-1 
and WISPS, and a third related gene, WISPS. The WISP genes 
are members of the CCN family of growth factors, which 
includes connective tissue growth factor (CTGF), Cyr61, and 
nov, a family not previously linked to Wnt signaling. 

MATERIALS AND METHODS 

SSH. SSH was performed by using the PCR-Select cDNA 
Subtraction Kit (CLONTECH). Tester double-stranded 

Abbreviations: TGF, transforming growth factor; CTGF, connective 
tissue growth factor, SSH, suppression subtractive hybridization; 
VWC, von Willebrand factor type C module. 
Data deposition: The sequences reported in this paper have been 
deposited in the Genbank database (accession nos. AF100777, 
AF100778, AF100779, AF100780, and AF100781). 
tTo whom reprint requests should be addressed, e-mail: diane@gene. 
com. 



14717 



14718 Cell Biology, Medical Sciences: Pennica et al. 



Proc. Nail. Acad. Sci. USA 95 (1998) 



cDNA was synthesized from 2 /j.g of poly(A)* RNA isolated 
from the C57MG/Wnt-1 cell line and driver cDNA from 2 ^.g 
of poly(A) + RNA from the parent C57MG cells. The sub- 
tracted cDNA library was subcloned into a pGEM-T vector for 
further analysis. 

cDNA Library Screening. Clones encoding full-length 
mouse WISP-1 were isolated by screening a AgtlO mouse 
embryo cDNA library (CLONTECH) with a 70-bp probe from 
the original partial clone 568 sequence corresponding to amino 
acids 128-169. Clones encoding full-length human WISP-1 
were isolated by screening AgtlO lung and fetal kidney cDNA 
libraries with the same probe at low stringency. Clones en- 
coding full-length mouse and human WlSP-2 were isolated by 
screening a C57MG/Wnt-1 or human fetal lung cDNA library 
with a probe corresponding to nucleotides 1463-1512. Full- 
length cDNAs encoding WISP-3 were cloned from human 
bone marrow and fetal kidney libraries. 

Expression of Human WISP RNA. PCR amplification of 
first-strand cDNA was performed with human Multiple Tissue 
cDNA panels (CLONTECH) and 300 ^M of each dNTP at 
94°C for 1 sec, 62°C for 30 sec, 72°C for 1 min, for 22-32 cycles. 
WISP and glyceraldehyde-3-phosphate dehydrogenase primer 
sequences are available on request. 

In Situ Hybridization. 33 P-labeled sense and antisense ribo- 
probes were transcribed from an 897-bp PCR product corre- 
sponding to nucleotides 601-1440 of mouse WISP-1 or a 
294-bp PCR product corresponding to nucleotides 82-375 of 
mouse WISP-2. All tissues were processed as described (40). 

Radiation Hybrid Mapping. Genomic DNA from each 
hybrid in the Stanford G3 and Genebridge4 Radiation Hybrid 
Panels (Research Genetics, Huntsville, AL) and human and 
hamster control DNAs were PCR-ampIified, and the results 
were submitted to the Stanford or Massachusetts Institute of 
Technology web servers. 

Cell Lines, Tumors, and Mucosa Specimens. Tissue speci- 
mens were obtained from the Department of Pathology (Uni- 
versity of Pittsburgh) for patients undergoing colon resection 
and from the University of Leeds, United Kingdom. Genomic 
DNA was isolated (Qiagen) from the pooled blood of 10 
normal human donors, surgical specimens, and the following 
ATCC human cell lines: SW480, COLO 320DM, HT-29, 
WiDr, and SW403 (colon adenocarcinomas), SW620 (lymph 
node metastasis, colon adenocarcinoma), HCT 116 (colon 
carcinoma), SK-CO-1 (colon adenocarcinoma, ascites), and 
HM7 (a variant of ATCC colon adenocarcinoma cell line LS 
174T). DNA concentration was determined by using Hoechst 
dye 33258 intercalation f luorimetry. Total RNA was prepared 
by homogenization in 7 M GuSCN followed by centrifugation 
over CsCl cushions or prepared by using RNAzol. 

Gene Amplification and RNA Expression Analysis. Relative 
gene amplification and RNA expression of WISPs and c-myc in 
the cell lines, colorectal tumors, and normal mucosa were 
determined by quantitative PCR. Gene-specific primers and 
fluorogenic probes (sequences available on request) were 
designed and used to amplify and quantitate the genes. The 
relative gene copy number was derived by using the formula 
2<aa) where ACt represents the difference in amplification 
cycles required to detect the WISP genes in peripheral blood 
lymphocyte DNA compared with colon tumor DNA or colon 
tumor RNA compared with normal mucosal RNA. The 
3-method was used for calculation of the SE of the gene copy 
number or RNA expression level. The H-VS^-specific signal was 
normalized to that of the glyceraldehyde-3-phosphate dehy- 
drogenase housekeeping gene. All TaqMan assay reagents 
were obtained from Perkin-Elmer Applied Biosystems. 

RESULTS 

Isolation of WISP-1 and WISP-2 by SSH. To identify Wnt- 
1-inducible genes, we used the technique of SSH using the 



mouse mammary epithelial cell line C57MG and C57MG cells 
that stably express Wnt-1 (11). Candidate differentially ex- 
pressed cDNAs (1,384 total) were sequenced. Thirty-nine 
percent of the sequences matched known genes or homo- 
logues, 32% matched expressed sequence tags, and 29% had 
no match. To confirm that the transcript was differentially 
expressed, semiquantitative reverse transcription-PCR and 
Northern analysis were performed by using mRNA from the 
C57MG and C57MG/Wnt-1 cells. 

Two of the cDNAs, WISP-1 and WISP-2, were differentially 
expressed, being induced in the C57MG/Wnt-1 cell line, but 
not in the parent C57MG cells or C57MG cells overexpressing 
Wnt-4 (Fig. 1 A and B). Wnt-4, unlike Wnt-1, does not induce 
the morphological transformation of C57MG cells and has no 
effect on 0-catenin levels (13, 14). Expression of WISP-1 was 
up-regulated approximately 3-fold in the C57MG/WM-1 cell 
line and WISP-2 by approximately 5-fold by both Northern 
analysis and reverse transcription-PCR. 

An independent, but similar, system was used to examine 
WISP expression after Wnt-1 induction. C57MG cells express- 
ing the Wnt-1 gene under the control of a tetracycline- 
repressible promoter produce low amounts of Wnt-1 in the 
repressed state but show a strong induction of Wnt-1 mRNA 
and protein within 24 hr after tetracycline removal (8). The 
levels of Wnt-1 and WISP RNA isolated from these cells at 
various times after tetracycline removal were assessed by 
quantitative PCR. Strong induction of Wnt-l mRNA was seen 
as early as 10 hr after tetracycline removal. Induction of WISP 
mRNA (2- to 6-fold) was seen at 48 and 72 hr (data not shown). 
These data support our previous observations that show that 
WISP induction is correlated with Wnt-1 expression. Because 
the induction is slow, occurring after approximately 48 hr, the 
induction of WISPs may be an indirect response to Wnt-1 
signaling. 

cDNA clones of human WISP-1 were isolated and the 
sequence compared with mouse WISP-1. The cDNA sequences 
of mouse and human WISP-1 were 1,766 and 2,830 bp in length, 
respectively, and encode proteins of 367 aa, with predicted 
relative molecular masses of =40,000 (M r 40 K). Both have 
hydrophobic N-terminal signal sequences, 38 conserved cys- 
teine residues, and four potential N-linked glycosylation sites 
and are 84% identical (Fig. 24). 

Full-length cDNA clones of mouse and human WISP-2 were 
1,734 and 1,293 bp in length, respectively, and encode proteins 
of 251 and 250 aa, respectively, with predicted relative molec- 
ular masses of ~ 27,000 (M r 27 K) (Fig. IB). Mouse and human 
WISP-2 are 73% identical. Human WISP-2 has no potential 
N-linked glycosylation sites, and mouse WISP-2 has one at 
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Fig. 1. WISP-1 and WISP-2 are induced by Wnt-1, but not Wnt-4, 
expression in C57MG cells. Northern analysis of WISP-1 (A) and 
mSP-2 (B) expression in C57MG, C57MG/Wnt-1, and C57MG/ 
Wnt-4 cells. Poly(A) + RNA (2 jig) was subjected to Northern blot 
analysis and hybridized with a 70-bp mouse W75P-/-specific probe 
(amino acids 278-300) or a 190-bp WISP-2-speafic probe (nucleotides 
1438-1627) in the 3' untranslated region. Blots were rehybridized with 
human /3-actin probe. 
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Fig. 2. Encoded amino acid sequence alignment of mouse and 
human WISP-1 (A ) and mouse and human WISP-2 (B). The potential 
signal sequence, insulin-like growth factor-binding protein (IGF-BP), 
VWC, thrombospondin (TSP), and C-terminal (CT) domains are 
underlined. 

position 197. WISP-2 has 28 cysteine residues that are con- 
served among the 38 cysteines found in WISP-1. 

Identification of WISPS. To search for related proteins, we 
screened expressed sequence tag (EST) databases with the 
WISP-1 protein sequence and identified several ESTs as 
potentially related sequences. We identified a homologous 
protein that we have called WISP-3. A full-length human 
WISP-3 cDNA of 1,371 bp was isolated corresponding to those 
ESTs that encode a 354-aa protein with a predicted molecular 
mass of 39,293. WISP-3 has two potential N-linked glycosyl- 
ation sites and 36 cysteine residues. An alignment of the three 
human WISP proteins shows that WISP-1 and WISP-3 are the 
most similar (42% identity), whereas WISP-2 has 37% identity 
with WISP-1 and 32% identity with WISP-3 (Fig. 3A). 

WISPs Are Homologous to the CTGF Family of Proteins. 
Human WISP-1, WISP-2, and WISPS are novel sequences; 
however, mouse WISP-1 is the same as the recently identified 
Elml gene. Elml is expressed in low, but not high, metastatic 
mouse melanoma cells, and suppresses the in vivo growth and 
metastatic potential of K-1735 mouse melanoma cells (15). 
Human and mouse WISP-2 are homologous to the recently 
described rat gene, rCop-1 (16). Significant homology (36- 
44%) was seen to the CCN family of growth factors. This family 
includes three members, CTGF, Cyr61, and the protoonco- 
gene nov. CTGF is a chemotactic and mitogenic factor for 
fibroblasts that is implicated in wound healing and fibrotic 
disorders and is induced byTGF-f! (17). Cyr61 is an extracel- 
lular matrix signaling molecule that promotes cell adhesion, 
proliferation, migration, angiogenesis, and tumor growth (18, 
19). nov (nephroblastoma overexpressed) is an immediate 
early gene associated with quiescence and found altered in 
Wilms tumors (20). The proteins of the CCN family share 
functional, but not sequence, similarity to Wnt-1. All are 
secreted, cysteine-rich heparin binding glycoproteins that as- 
sociate with the cell surface and extracellular matrix. 

WISP proteins exhibit the modular architecture of the CCN 
family, characterized by four conserved cysteine-rich domains 
(Fig. 3fl) (21). The N-terminal domain, which includes the first 
12 cysteine residues, contains a consensus sequence (GCGC- 
CXXC) conserved in most insulin-like growth factor' (IGF)- 
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Flo. 3. (A) Encoded amino acid sequence alignment of human 
WISPs. The cysteine residues of WISP-1 and WISP-2 that are not 
present in WISP-3 are indicated with a dot. (B) Schematic represen- 
tation of the WISP proteins showing the domain structure and cysteine 
residues (vertical lines). The four cysteine residues in the VWC domain 
that are absent in WISP-3 are indicated with a dot. (C) Expression of 
WISP mRNA in human tissues. PCR was performed on human 
multiple-tissue cDNA panels (CLONTECH) from the indicated adult 
and fetal tissues. 

binding proteins (BP). This sequence is conserved in WISP-2 
and WISP-3, whereas WISP-1 has a glutamine in the third 
position instead of a glycine. CTGF recently has been shown 
to specifically bind IGF (22) and a truncated nov protein 
lacking the IGF-BP domain is oncogenic (23). The von Wil- 
lebrand factor type C module (VWC), also found in certain 
collagens and mucins, covers the next 10 cysteine residues, and 
is thought to participate in protein complex formation and 
oligomerization (24). The VWC domain of WISP-3 differs 
from all CCN family members described previously, in that it 
contains only six of the 10 cysteine residues (Fig. 3 A and B). 
A short variable region follows the VWC domain. The third 
module, the thrombospondin (TSP) domain is involved in 
binding to sulfated glycoconjugates and contains six cysteine 
residues and a conserved WSxCSxxCG motif first identified in 
thrombospondin (25). The C-terminal (CT) module contain- 
ing the remaining 10 cysteines is thought to be involved in 
dimerization and receptor binding (26). The CT domain is 
present in all CCN family members described to date but is 
absent in WISP-2 (Fig. 3 A and B). The existence of a putative 
signal sequence and the absence of a transmembrane domain 
suggest that WISPs are secreted proteins, an observation 
supported by an analysis of their expression and secretion from 
mammalian cell and baculovirus cultures (data not shown). 

Expression of WISP mRNA in Human Tissues. Tissue- 
specific expression of human WISPs was characterized by PCR 
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analysis on adult and fetal multiple tissue cDNA panels. 
W1SP-1 expression was seen in the adult heart, kidney, lung, 
pancreas, placenta, ovary, small intestine, and spleen (Fig. 3C). 
Little or no expression was detected in the brain, liver, skeletal 
muscle, colon, peripheral blood leukocytes, prostate, testis, or 
thymus. WISP-2 had a more restricted tissue expression and 
was detected in adult skeletal muscle, colon, ovary, and fetal 
lung. Predominant expression of WISP-3 was seen in adult 
kidney and testis and fetal kidney. Lower levels of WISP-3 
expression were detected in placenta, ovary, prostate, and 
small intestine. 

In Situ Localization of WISP-1 and WISP-2. Expression of 
WISP-1 and WISP-2 was assessed by in situ hybridization in 
mammary tumors from Wnt-1 transgenic mice. Strong expres- 
sion of WISP-1 was observed in stromal fibroblasts lying within 
the fibrovascular tumor stroma (Fig. 4 A-D). However, low- 
level WISP-1 expression also was observed focally within tumor 
cells (data not shown). No expression was observed in normal 
breast. Like WISP-1, WISP-2 expression also was seen in the 
tumor stroma in breast tumors from Wnt-1 transgenic animals 
(Fig. 4 E-H). However, WISP-2 expression in the stroma was 
in spindle-shaped cells adjacent to capillary vessels, whereas 




Fig. 4. (A, C, E, and G) Representative hematoxylin/eosin-stained 
images from breast tumors in Wnt-1 transgenic mice. The correspond- 
ing dark-field images showing WISP-1 expression are shown in B and 
D. The tumor is a moderately well-differentiated adenocarcinoma 
showing evidence of adenoid cystic change. At low power (A and B), 
expression of WISP-1 is seen in the delicate branching fibrovascular 
tumor stroma (arrowhead). At higher magnification, expression is seen 
in the stromal(s) fibroblasts (C and D), and tumor cells are negative. 
Focal expression of WISP-1, however, was observed in tumor cells in 
some areas. Images of WISP-2 expression are shown in E-H. At low 
power (E and F), expression of WISP-2 is seen in cells lying within the 
fibrovascular tumor stroma. At higher magnification, these cells 
appeared to be adjacent to capillary vessels whereas tumor cells are 
negative (G and H). 
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the predominant cell type expressing WISP-1 was the stromal 
fibroblasts. 

Chromosome Localization of the WISP Genes. The chro- 
mosomal location of the human WISP genes was determined 
by radiation hybrid mapping panels. WISP-1 is approximately 
3.48 cR from the meiotic marker AFM259xc5 [logarithm of 
odds (lod) score 16.31] on chromosome 8q24.1 to 8q24.3, in the 
same region as the human locus of the novti family member 
(27) and roughly 4 Mbs distal to c-myc (28). Preliminary fine 
mapping indicates that WISP-1 is located near D8S1712 STS. 
WISP-2 is linked to the marker SHGC-33922 (lod = 1,000) on 
chromosome 20ql2-20ql3.1. Human WISPS mapped to chro- 
mosome 6q22-6q23 and is linked to the marker AFM211ze5 
(lod = 1,000). WISP-3 is approximately 18 Mbs proximal to 
CTGF and 23 Mbs proximal to the human cellular oncogene 
MYB (27, 29). 

Amplification and Aberrant Expression of WISPs in Human 
Colon Tumors. Amplification of protooncogenes is seen in 
many human tumors and has etiological and prognostic sig- 
nificance. For example, in a variety of tumor types, c-myc 
amplification has been associated with malignant progression 
and poor prognosis (30). Because WISP-1 resides in the same 
general chromosomal location (8q24) as c-myc, we asked 
whether it was a target of gene amplification, and, if so, 
whether this amplification was independent of the c-myc locus. 
Genomic DNA from human colon cancer cell lines was 
assessed by quantitative PCR and Southern blot analysis. (Fig. 
5 A and B). Both methods detected similar degrees of WISP-1 
amplification. Most cell lines showed significant (2- to 4-fold) 
amplification, with the HT-29 and WiDr cell lines demonstrat- 
ing an 8-fold increase. Significantly, the pattern of amplifica- 
tion observed did not correlate with that observed for c-myc, 
indicating that the c-myc gene is not part of the amplicon that 
involves the WISP-1 locus. 

We next examined whether the WISP genes were amplified 
in a panel of 25 primary human colon adenocarcinomas. The 
relative WISP gene copy number in each colon tumor DNA 
was compared with pooled normal DNA from 10 donors by 
quantitative PCR (Fig. 6). The copy number of WISP-1 and 
WISP-2 was significantly greater than one, approximately 
2-fold for WISP-1 in about 60% of the tumors and 2- to 4-fold 
for WISP-2 in 92% of the tumors (P < 0.001 for each). The 
copy number for WISP-3 was indistinguishable from one {P = 
0.166). In addition, the copy number of WISP-2 was signifi- 
cantly higher than that of WISP-1 (P < 0.001). 

The levels of WISP transcripts in RNA isolated from 19 
adenocarcinomas and their matched normal mucosa were 




Fig. 5. Amplification of WISP-1 genomic DNA in colon cancer cell 
lines. (A) Amplification in cell line DNA was determined by quanti- 
tative PCR. (B) Southern blots containing genomic DNA (10 fig) 
digested with £coRI (WISP-1) or Xba\ (c-myc) were hybridized with 
a 100-bp human WISP-i probe (amino acids 186-219) or a human 
c-myc probe (located at bp 1901-2000). The WISP and myc genes are 
detected in normal human genomic DNA after a longer film exposure. 
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Fig. 6. Genomic amplification of WISP genes in human colon 
tumors. The relative gene copy number of the WISP genes in 25- 
adenocarcinomas was assayed by quantitative PCR, by comparing 
DNA from primary human tumors with pooled DNA from 10 healthy 
donors. The data are means H SEM from one experiment done in 
triplicate. The experiment was repeated at least three times. 

assessed by quantitative PCR (Fig. 7). The level of WISP-1 
RNA present in tumor tissue varied but was significantly 
increased (2- to >25-fold) in 84% (16/19) of the human colon 
tumors examined compared with normal adjacent mucosa. 
Four of 19 tumors showed greater than 10-fold overexpression. 
In contrast, in 79% (15/19) of the tumors examined, WISP-2 
RNA expression was significantly lower in the tumor than the 
mucosa. Similar to WISP-1, WISP-3 RNA was overexpressed in 
63% (12/19) of the colon tumors compared with the normal 
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Fig. 7. WISP RNA expression in primary human colon tumors 
relative to expression in normal mucosa from the same patient 
Expression of WISP mRNA in 19 adenocarcinomas was assayed by 
quantitative PCR. The Dukes stage of the tumor is listed under the 
sample number. The data are means ± SEM from one experiment 
done in triplicate. The experiment was repeated at least twice. 



mucosa. The amount of overexpression of WISPS ranged from 
4- to >40-fold. 



DISCUSSION 

One approach to understanding the molecular basis of cancer 
is to identify differences in gene expression between cancer 
cells and normal cells. Strategies based on assumptions that 
steady-state mRNA levels will differ between normal and 
malignant cells have been used to clone differentially ex- 
pressed genes (31). We have used a PCR-based selection 
strategy, SSH, to identify genes selectively expressed in 
C57MG mouse mammary epithelial cells transformed by 
Wnt-1. 

Three of the genes isolated, WISP-1, WISP-2, and WISP-3, 
are members of the CCN family of growth factors, which 
includes CTGF, Cyr61, and nov, a family not previously linked 
to Wnt signaling. 

Two independent experimental systems demonstrated that 
WISP induction was associated with the expression of Wnt-1. 
The first was C57MG cells infected with a Wnt-1 retroviral 
vector or C57MG cells expressing Wnt-1 under the control of 
a tetracyline-repressible promoter, and the second .was in 
Wnt-1 transgenic mice, where breast tissue expresses Wnt-1, 
whereas normal breast tissue does not. No WISP RNA expres- 
sion was detected in mammary tumors induced by polyoma 
virus middle T antigen (data not shown). These data suggest 
a link between Wnt-1 and WISPs in that in these two situations, 
WISP induction was correlated with Wnt-1 expression. 

It is not clear whether the WISPs are directly or indirectly 
induced by the downstream components of the Wnt-1 signaling 
pathway (i.e., /3-catenin-TCF-l/Lefl). The increased levels of 
WISP RNA were measured in Wnt-l-transformed cells, hours 
or days after Wnt-1 transformation. Thus, WISP expression 
could result from Wnt-1 signaling directly through /3-catenin 
transcription factor regulation or alternatively through Wnt-1 
signaling turning on a transcription factor, which in turn 
regulates WISPs. 

The WISPs define an additional subfamily of the CCN family 
of growth factors. One striking difference observed in the 
protein sequence of WISP-2 is the absence of a CT domain, 
which is present in CTGF, Cyr61, nov, WISP-1, and WISP-3. 
This domain is thought to be involved in receptor binding and 
dimerization. Growth factors, such as TGF-/3, platelet-derived 
growth factor, and nerve growth factor, which contain a cystine 
knot motif exist as dimers (32). It is tempting to speculate that 
WISP-1 and WISP-3 may exist as dimers, whereas WISP-2 
exists as a monomer. If the CT domain is also important for 
receptor binding, WISP-2 may bind its receptor through a 
different region of the molecule than the other CCN family 
members. No specific receptors have been identified for CTGF 
or nov. A recent report has shown that integrin a v /3s serves as 
an adhesion receptor for Cyr61 (33). 

The strong expression of WISP-1 and WISP-2 in cells lying 
within the fibrovascular tumor stroma in breast tumors from 
Wnt-1 transgenic animals is consistent with previous obser- 
vations that transcripts for the related CTGF gene are pri- 
marily expressed in the fibrous stroma of mammary tumors 
(34). Epithelial cells are thought to control the proliferation of 
connective tissue stroma in mammary tumors by a cascade of 
growth factor signals similar to that controlling connective 
tissue formation during wound repair. It has been proposed 
that mammary tumor cells or inflammatory cells at the tumor 
interstitial interface secrete TGF-/31, which is the stimulus for 
stromal proliferation (34). TGF-/31 is secreted by a large 
percentage of malignant breast tumors and may be one of the 
growth factors that stimulates the production of CTGF and 
WISPs in the stroma. 

It was of interest that WISP-1 and WISP-2 expression was 
observed in the stromal cells that surrounded the tumor cells 
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(epithelial cells) in the Wnt-1 transgenic mouse sections of 
breast tissue. This finding suggests that paracrine signaling 
could occur in which the stromal cells could supply WISP-1 and 
WISP-2 to regulate tumor cell growth on the WISP extracel- 
lular matrix. Stromal cell-derived factors in the extracellular 
matrix have been postulated to play a role in tumor cell 
migration and proliferation (35). The localization of WISP-1 
and WISP-2 in the stromal cells of breast tumors supports this 
paracrine model. 

An analysis of WISP-1 gene amplification and expression in 
human colon tumors showed a correlation between DNA 
amplification and overexpression, whereas overexpression of 
WISP-3 RNA was seen in the absence of DNA amplification. 
In contrast, WISP-2 DNA was amplified in the colon tumors, 
but its mRNA expression was significantly reduced in the 
majority of tumors compared with the expression in normal 
colonic mucosa from the same patient. The gene for human 
WISP-2 was localized to chromosome 20ql2-20ql3, at a region 
frequently amplified and associated with poor prognosis in 
node negative breast cancer and many colon cancers, suggest- 
ing the existence of one or more oncogenes at this locus 
(36-38). Because the center of the 20ql3 amplicon has not yet 
been identified, it is possible that the apparent amplification 
observed for WISP-2 may be caused by another gene in this 
amplicon. 

A recent manuscript on rCop-1, the rat orthologue of 
WISP-2, describes the loss of expression of this gene after cell 
transformation, suggesting it may be a negative regulator of 
growth in cell lines (16). Although the mechanism by which 
WISP-2 RNA expression is down-regulated during malignant 
transformation is unknown, the reduced expression of WISP-2 
in colon tumors and cell lines suggests that it may function as 
a tumor suppressor. These results show that the WISP genes 
are aberrantly expressed in colon cancer and suggest that their 
altered expression may confer selective growth advantage to 
the tumor. 

Members of the Wnt signaling pathway have been impli- 
cated in the pathogenesis of colon cancer, breast cancer, and 
melanoma, including the tumor suppressor gene adenomatous 
polyposis coli and 0-catenin (39). Mutations in specific regions 
of either gene can cause the stabilization and accumulation of 
cytoplasmic 0-catenin, which presumably contributes to hu- 
man carcinogenesis through the activation of target genes such 
as the WISPs. Although the mechanism by which Wnt-1 
transforms cells and induces tumorigenesis is unknown, the 
identification of WISPs as genes that may be regulated down- 
stream of Wnt-1 in C57MG cells suggests they could be 
important mediators of Wnt-1 transformation. The amplifica- 
tion and altered expression patterns of the WISPs in human 
colon tumors may indicate an important role for these genes 
in tumor development. , 
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methods. Peptides AENK or AEQKwere dissolved in water, made isotonic with 
NaCl and diluted into RPM1 growth medium. T-cell-proliferation assays were 
done essentially as described 1011 . Briefly, after antigen pulsing (30u.gmr' 
TTCF) with tetrapeptides {l-2mgmr'), PBMCs or EBV-B cells were 
washed in PBS and fixed for 45 s in 0.05% glutaraldehyde. Glycine was added 
to a final concentration of 0.1M and the cells were washed five times in RPMI 
1640 medium containing 1% FCS before co-culture with T-cell clones in 
round-bottom 96-well microtitre plates. After 48 h, the cultures were pulsed 
with 1 u.Ci of 3 H-thymidine and harvested for scintillation counting 16 h later. 
Predigestion of native TTCF was done by incubating 200 u.g TTCF with 0.25 u-g 
pig kidney legumain in 500 u.1 50 mM citrate buffer, pH 5.5, for 1 h at 37°C. 
Glycopeptide digestions. The peptides HIDNEEDI, HlDN(N-glucosamine) 
EEDI and HIDNESDI, which are based on the TTCF sequence, and 
QQQHLFGSNVTDCSGNFCLFR(KKK), which is based on human transferrin, 
were obtained by custom synthesis. The three C-terminal lysine residues were 
added to the natural sequence to aid solubility. The transferrin glycopeptide 
QQQHLFGSNVTDCSGNFCLFR was prepared by tryptic (Promega) digestion 
of 5 mg reduced, carboxy-methylated human transferrin followed by 
concanavalin A chromatography". Glycopeptides corresponding to residues 
622-642 and 42 1-452 were isolated by reverse-phase HPLC and identified by 
mass spectrometry and N-terminal sequencing. The lyophilized transferrin- 
derived peptides were redissolved in 50'mM sodium acetate, pH 5.5, 10 mM 
dithiothreitol, 20% methanol. Digestions were performed for 3 h at 30 °C with 
5-50 mil ml"' pig kidney legumain or B-cell AEP. Products were analysed by 
HPLC or MALDI-TOF mass spectrometry using a matrix of lOmgml"' a- 
cyanocinnamic acid in 50% acetonitrile/0. 1% TFA and a PerSeptive Biosystems 
Elite STR mass spectrometer set to linear or reflector mode. Internal standar- 
dization was obtained with a matrix ion of 568.13 mass units. 
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Fas ligand (FasL) is produced by activated T cells and natural 
killer cells and it induces apoptosis (programmed cell death) in 
target cells through the death receptor Fas/Apol/CD95 (ref. 1). 
One important role of FasL and Fas is to mediate immune- 
cytotoxic killing of cells that are potentially harmful to the 
organism, such as virus-infected or tumour cells'. Here we 
report the discovery of a soluble decoy receptor, termed decoy 
receptor 3 (DcR3), that binds to FasL and inhibits FasL-induced 
apoptosis. The DcR3 gene was amplified in about half of 35 
primary lung and colon tumours studied, and DcR3 messenger 
RNA was expressed in malignant tissue. Thus, certain tumours 
may escape FasL-dependent irnmune-cytotoxic attack by expres- 
sing a decoy receptor that blocks FasL 

By searching expressed sequence tag (EST) databases, we identi- 
fied a set of related ESTs that showed homology to the tumour 
necrosis factor (TNF) receptor (TNFR) gene superfamily 2 . Using 
the overlapping sequence, we isolated a previously unknown full- 
length complementary DNA from human fetal lung. We named the 
protein encoded by this cDNA decoy receptor 3 (DcR3). The cDNA 
encodes a 300-amino-acid polypeptide that resembles members of 
the TNFR family (Fig. la): the amino terminus contains a leader 
sequence, which is followed by four tandem cysteine -rich domains 
(CRDs). Like one other TNFR homologue, osteoprotegerin (OPG) 3 , 
DcR3 lacks an apparent transmembrane sequence, which indicates 
that it may be a secreted, rather than a membrane-asscociated, 
molecule. We expressed a recombinant, histidine-tagged form of 
DcR3 in mammalian cells; DcR3 was secreted into the cell culture 
medium, and migrated on polyacrylamide gels as a protein of 
relative molecular mass 35,000 (data not shown). DcR3 shares 
sequence identity in particular with OPG (31%) and TNFR2 
(29%), and has relatively less homology with Fas (17%). All of 
the cysteines in the four CRDs of DcR3 and OPG are conserved; 
however, the carboxy-terminal portion of DcR3 is 101 residues 
shorter. 

We analysed expression of DcR3 mRNA in human tissues by 
northern blotting (Fig. lb). We detected a predominant 1.2-kiiobase 
transcript in fetal lung, brain, and liver, and in adult spleen, colon 
and lung. In addition, we observed relatively high DcR3 mRNA 
expression in the human colon carcinoma cell line SW480. 

To investigate potential ligand interactionsof DcR3, we generated 
a recombinant, Fc-tagged DcR3 protein. We tested binding of 
DcR3-Fc to human 293 cells transfected with individual TNF- 
family ligands, which are expressed as type 2 transmembrane 
proteins (these transmembrane proteins have their N termini in 
the cytosol). DcR3-Fc showed a significant increase in binding to 
cells transfected with FasL* (Fig. 2a), but not to cells transfected with 
TNF 5 , Apo2L/TRAIL 6 ' 7 , Apo3L/TWEAK"'', or OPGL/TRANCE/ 
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RANKL 10 " 12 (data not shown). DcR3-Fc immunoprecipitated shed 
FasL from FasL-transfected 293 cells (Fig. 2b) and purified soluble 
FasL (Fig. 2c), as did the Fc-tagged ectodomain of Fas but not 
TNFR1. Gel-filtration chromatography showed that DcR3-Fc and 
soluble FasL formed a stable complex (Fig. 2d). Equilibrium 
analysis indicated that DcR3-Fc and Fas-Fc bound to soluble 
FasL with a comparable affinity (Kj = 0.8 ± 0.2 and 
l.liO.lnM, respectively; Fig. 2e), and that DcR3-Fc could 
block nearly all of the binding of soluble FasL to Fas-Fc (Fig. 2e, 
inset). Thus, DcR3 competes with Fas for binding to FasL. 

To determine whether binding of DcR3 inhibits FasL activity, we 
tested the effect of DcR3-Fc on apoptosis induction by soluble 
FasL in Jurkat T leukaemia cells, which express Fas (Fig. 3a). DcR3- 
Fc and Fas-Fc blocked soluble-FasL-induced apoptosis in a 
similar dose-dependent manner, with half-maximal inhibition at 
—0.1 p.g ml"'. Time-course analysis showed that the inhibition did 
not merely delay cell death, but rather persisted for at least 24 hours 
(Fig. 3b). We also tested the effect of DcR3-Fc on activation- 
induced cell death (AICD) of mature T lymphocytes, a FasL- 
dependent process 1 . Consistent with previous results", activation 
of interleukin-2-stimulated CD4-positive T cells with anti-CD3 
antibody increased the level of apoptosis twofold, and Fas-Fc 
blocked this effect substantially (Fig. 3c); DcR3-Fc blocked the 



induction of apoptosis to a similar extent. Thus, DcR3 binding : 
blocks apoptosis induction by FasL. 

FasL-induced apoptosis is important in elimination of virus- 
infected cells and cancer cells by natural killer cells and cytotoxic T 
lymphocytes; an alternative mechanism involves perforin and 
granzymes 1,1 *""'. Peripheral blood natural killer cells triggered 
marked cell death in Jurkat T leukaemia cells (Fig. 3d); DcR3-Fc 
and Fas-Fc each reduced killing of target cells from ~65% to 
~30%, with half-maximal inhibition at ~lu.gmr'; the residual 
killing was probably mediated by the perforin/granzyme pathway. 
Thus, DcR3 binding blocks FasL-dependent natural killer cell 
activity. Higher DcR3-Fc and Fas-Fc concentrations were required 
to block natural killer cell activity compared with those required to 
block soluble FasL activity, which is consistent with the greater 
potency of membrane-associated FasL compared with soluble 
FasL". 

Given the role of immune-cytotoxic cells in elimination of 
tumour cells and the fact that DcR3 can act as an inhibitor of 
FasL, we proposed that DcR3 expression might contribute to the 
ability of some tumours to escape immune-cytotoxic attack. As 
genomic amplification frequently contributes to tumorigenesis, we 
investigated whether the DcR3 gene is amplified in cancer. We 
analysed DcR3 gene-copy number by quantitative polymerase chain 
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Figure 1 Primary structure and expression of human DcR3. a, Alignment of the 
amino-acid sequences of DcR3 and of osteoprotegerin (OPG): the C-terminal 101 
residues of OPG are not shown. The putative signal cleavage site (arrow), the 
cysteine-rich domains (CRD 1 -4), and the W-linked glycosylation site (asterisk) are 
shown, b. Expression of DcR3 mRNA. Northern hybridization analysis was done 
using the DcR3 cDNA as a probe and blots of pofy(A)' RNA (Clontech) from 
human fetal and adult tissues or cancer cell lines. PBL, peripheral blood 
lymphocyte. 



Figure 2 Interaction of OcR3 with FasL. a. 293 cells were transfected with pRK5 
vector (top) or with pRK5 encoding full-length FasL (bottom), incubated with 
DcR3-Fc (solid line, shaded area), TNFRl-Fc (dotted line) or buffer control 
(dashed line) (the dashed and dotted lines overlap), and analysed for binding by 
FACS. Statistical analysis showed a significant difference (P < 0.001 ) between the 
binding of DcR3-Fc to cells transfected with FasL or pRK5. PE. phycoerythrin- 
labelled cells, b, 293 cells were transfected as in a and metabolically labelled, and 
cell supernatants were immunoprecipitated with Fc-tagged TNFR1, DcR3 or Fas. 
c. Purified soluble FasL (sFasL) was immunoprecipitated with TNFR 1 -Fc, DcR3- 
Fc or Fas-Fc and visualized by immunoblot with anti-FasL antibody. sFasL was 
loaded directly for comparison in the right-hand lane, d. Flag-tagged sFasL was 
incubated with DcR3-Fc or with buffer and resolved by gel filtration: column 
fractions were analysed in an assay that detects complexes containing DcR3-Fc 
and sFasL-Flag. e. Equilibrium binding of DcR3-Fc or Fas-Fc to sFasL-Flag. 
Inset, competition of DcR3-Fc with Fas-Fc for binding to sFasL-Flag. 
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reaction (PCR)'" in genomic DNA from 35 primary lung and colon 
tumours, relative to pooled genomic DNA from peripheral blood 
leukocytes (PBLs) of 10 healthy donors. Eight of 18 lung tumours 
and 9 of 17 colon tumours showed DcR3 gene amplification, 
ranging from 2- to 18-fold (Fig. 4a, b). To confirm this result, we 
analysed the colon tumour DNAs with three more, independent sets 
of DcR3-based PCR primers and probes; we observed nearly the 
same amplification (data not shown). 

We then analysed DcR3 mRNA expression in primary tumour 
tissue sections by in siru hybridization. We detected DcR3 expres- 
sion in 6 out of 15 lung tumours, 2 out of 2 colon tumours, 2 out of 5 
breast tumours, and 1 out of 1 gastric tumour (data not shown). A 
section through a squamous-cell carcinoma of the lung is shown in 
Fig. 4c. DcR3 mRNA was localized to infiltrating malignant epithe- 
lium, but was essentially absent from adjacent stroma, indicating 
tumour-specific expression. Although the individual tumour speci- 
mens that we analysed for mRNA expression and gene amplification 
were different, the in situ hybridization results are consistent with 
the finding that the DcR3 gene is amplified frequently in tumours. 
SW480 colon carcinoma cells, which showed abundant DcR3 
mRNA expression (Fig. lb), also had marked DcR3 gene amplifica- 
tion, as shown by quantitative PCR (fourfold) and by Southern blot 
hybridization (fivefold) (data not shown). 

If DcR3 amplification in cancer is functionally relevant, then 
DcR3 should be amplified more than neighbouring genomic 
regions that are not important for tumour survival. To test this, 



we mapped the human DcR3 gene by radiation-hybrid analysis; 
DcR3 showed linkage to marker AFM218xe7 (T160), which maps to 
chromosome position 20ql3. Next, we isolated from a bacterial 
artificial chromosome (BAC) library a human genomic clone that 
carries DcR3, and sequenced the ends of the clone's insert. We then 
determined, from the nine colon tumours that showed twofold or 
greater amplification of DcR3, the copy number of the DcR3- 
flanking sequences (reverse and forward) from the BAC, and of 
seven genomic markers that span chromosome 20 (Fig. 4d). The 
DcR3-linked reverse marker showed an average amplification of 
roughly threefold, slightly less than the approximately fourfold 
amplification of DcR3; the other markers showed little or no 
amplification. These data indicate that DcR3 may be at the 'epi- 
centre' of a distal chromosome 20 region that is amplified in colon 
cancer, consistent with the possibility that DcR3 amplification 
promotes tumour survival. 

Our results show that DcR3 binds specifically to FasL and inhibits 
FasL activity. We did not detect DcR3 binding to several other TNF- 
ligand-family members; however, this does not rule out the possi- 
bility that DcR3 interacts with other ligands, as do some other 
TNFR family members, including OPG 2 '". 

FasL is important in regulating the immune response; however, 
little is known about how FasL function is controlled. One mechan- 
ism involves the molecule cFLIP, which modulates apoptosis signal- 
ling downstream of Fas 20 . A second mechanism involves proteolytic 
shedding of FasL from the cell surface". DcR3 competes with Fas for 
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Figure 3 Inhibition of FasL activity by DcR3. a. Human Jurkat T leukaemia cells 
were incubated with Flag-tagged soluble FasL (sFasL;. 5ngmr') oligomerized 
with anti-Flag antibody (0.1 u.gmr') in the presence of the proposed inhibitors 
0cR3-Fc, Fas-Fc or human IgGI arid assayed for apoptosis (mean » s.e.m. of 
triplicates), b. Jurkat cells were incubated with sFasL-Flag.plus anti-Flag antibody 
as in a, in presence of t ug ml"' DcR3-Fc (filled circles). Fas-Fc (open circles) or 
human IgGl (triangles), and apoptosis was determined at the indicated time 
points, c. Peripheral blood T cells were stimulated with PHA and interleukin-2. 
followed by control (white bars) or anti-CD3 antibody (filled bars), together with 
phosphate-buffered saline (PBS), human IgGl, Fas-Fc. or DcR3-Fc (10u.gmr'). 
After 16 h, apoptosis of CD4' cells was determined (mean £ s.e.m. of results from 
five donors), d, Peripheral blood natural killer cells were incubated with ^re- 
labelled Jurkat cells in the presence of DcR3-Fc (filled circles), Fas-Fc (open 
circles) or human lgG1 (triangles), and target-cell death was determined by 
release of 51 Cr (mean * s.d. for two donors, each in triplicate).- 
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Figure 4 Genomic amplification of DcR3 in tumours, a. Lung cancers, comprising 
eight adenocarcinomas (c, d. f. g. h, j, k, r), seven squamous-cell carcinomas (a. e, 
m, n, o, p, q), one non-small-cell carcinoma (b), one small-cell carcinoma (i). and 
one bronchial adenocarcinoma (I). The data are means ± s.d. of 2 experiments 
done in duplicate, b, Colon tumours, comprising 17 adenocarcinomas. Data are 
means ± s.e.m. of five experiments done in duplicate, c, In situ hybridization 
analysis of DcR3 mRNA expression in a squamous-cell carcinoma of the lung. A 
representative bright-field image (left) and the corresponding dark-field image 
(right) show DcR3 mRNA over infiltrating malignant epithelium (arrowheads). 
Adjacent non-malignant stroma (S). blood vessel (V) and necrotic tumour tissue 
(N) are also shown, d. Average amplification of DcR3 compared with amplifica- 
tion of neighbouring genomic regions (reverse and forward. Rev and Fwd), the 
OcR3-linked marker T160. and other chromosome-20 markers, in the nine colon 
tumours showing DcR3 amplification of twofold or more (b). Data are from two 
experiments done in duplicate. Asterisk indicates P < 0.0 1 for a Student's t-test 
comparing each marker with DcR3. 
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FasL binding; hence, it may represent a third mechanism of 
extracellular regulation of FasL activity. A decoy receptor that 
modulates the function of the cytokine interleukin-1 has been 
described 21 . In addition, two decoy receptors that belong to the 
TNFR family, DcRl and DcR2, regulate the FasL-related apoptosis- 
inducing molecule Apo2L 2J . Unlike DcRl and DcR2, which are 
membrane-associated proteins, DcR3 is directly secreted into the 
extracellular space. One other secreted TNFR-family member is 
OPG 3 , which shares greater sequence homology with DcR3 (31%) 
than do DcRl (17%) or DcR2 (19%); OPG functions as a third 
decoy for Apo2L". Thus, DcR3 and OPG define a new subset of 
TNFR-family members that function as secreted decoys to mod- 
ulate ligands that induce apoptosis. Pox viruses produce soluble 
TNFR homologues that neutralize specific TNF-family ligands, 
thereby modulating the antiviral immune response 2 . Our results 
indicate that a similar mechanism, namely, production of a soluble 
decoy receptor for FasL, may contribute to immune evasion by 
certain tumours. D 



Methods 

Isolation of DcR3 cONA. Several overlapping ESTs in GenBank (accession 
numbers AA025672, AA025673 and W67560) and in Lifeseq™ (Incyte 
Pharmaceuticals; accession numbers 1339238, 1533571, 1533650, 1542861, 
1789372 and 2207027) showed similarity to members of the TNFR family. We 
screened human cDNA libraries by PCR with primers based on the region of 
EST consensus; fetal lung was positive for a product of the expected size. By 
hybridization to a PCR-generated probe based on the ESTs, one positive clone 
(DNA30942) was identified. When searching for potential alternatively spliced 
forms of DcR3 that might encode a transmembrane protein, we isolated 50 
more clones; the coding regions of these clones were identical in size to that of 
the initial clone (data not shown). 

Fc-fusion proteins (immunoadhesins). The entire DcR3 sequence, or the 
ectodomain of Fas or TNFR1, was fused to the hinge and Fc region of human 
IgGl, expressed in insect SF9 cells or in human 293 cells, and purified as 
described 11 . 

Fluorescence-activated cell sorting (FACS) analysis. We transfected 293 
cells using calcium phosphate or Effectene (Qiagen) with pRK5 vector or pRK5 
encoding full-length human FasL' (2 pig), together with pRK5 encoding CrmA 
(2p.g) to prevent cell death. After 16 h, the cells were incubated with 
biotinylated DcR3-Fc or TNFRl-Fc and then with phycoerythrin-conjugated 
streptavidin (GibcoBRL), and were assayed by FACS. The data were analysed by 
Kolmogorov-Smirnov statistical analysis. There was some detectable staining 
of vector-transfected cells by DcR3-Fc; as these cells express little FasL (data 
not shown), it is possible that DcR3 recognized some other factor that is 
expressed constitutively on 293 cells. 

Immunoprecipitation. Human 293 cells were transfected as above, and 
metabolically labelled with ( ]s S]cysteine and [ 3S S| methionine (0.5 mCi; 
Amersham). After 16h of culture in the presence of z-VAD-fmk (10|xM), 
the medium was immunoprecipitated with DcR3-Fc, Fas-Fc or TNFRl-Fc 
(5p-g), followed by protein A-Sepharose (Repligen). The precipitates were 
resolved by SDS-PAGE and visualized on a phosphorimager (Fuji BAS2000). 
Alternatively, purified. Flag-tagged soluble FasL (1 u.g) (Alexis) was incubated 
with each Fc-fusion protein (lu,g), precipitated with protein A-Sepharose, 
resolved by SDS-PAGE and visualized by immunoblotting with rabbit anti- 
FasL antibody (Oncogene Research). 

Analysis of complex formation. Flag-tagged soluble FasL (25u.g) was 
incubated with buffer or with DcR3-Fc (40 p.g) for 1.5 h at 24 °C. The reaction 
was loaded onto a Superdex200 HR 10/30 column (Pharmacia) and developed 
with PBS; 0.6-ml fractions were collected. The presence of DcR3-Fc-FasL 
complex in each fraction was analysed by placing 100 ul aliquots into microti tre 
wells precoated with anti-human IgG (Boehringer) to capture DcR3-Fc, 
followed by detection with biotinylated anti-Flag antibody Bio M2 (Kodak) and 
streptavidin-horseradish peroxidase (Amersham). Calibration of the column 
indicated an apparent relative molecular mass of the complex of 420K (data not 
shown), which is consistent with a stoichiometry of two DcR3-Fc homodimers 
to two soluble FasL homotrimers. 

Equilibrium binding analysis. Microtitre wells were coated with anti-human 
702 



IgG, blocked with 2% BSA in PBS. DcR3-Fc or Fas-Fc was added, followed by 
serially diluted Flag-tagged soluble FasL. Bound ligand was detected with anti- 
Flag antibody as above. In the competition assay, Fas-Fc was immobilized as 
above, and the wells were blocked with excess IgGl before addition of Flag- 
tagged soluble FasL plus DcR3-Fc. 

T-cell AICD. CD3* lymphocytes were isolated from peripheral blood of 
individual donors using anti-CD3 magnetic beads (Miltenyi Biotech), 
stimulated with phytohaemagglutinin (PHA; 2 (ig ml"') for 24 h, and cultured 
in the presence of interleukin-2 ( 100 U ml" 1 ) for 5 days. The cells were plated in 
wells coated with anti-CD3 antibody (Pharmingen) and analysed for apoptosis 
16 h later by FACS analysis of annexin-V-binding of CD4* cells 1 '. 
Natural killer cell activity. Natural killer cells were isolated from peripheral 
blood of individual donors using anti-CD56 magnetic beads (Miltenyi 
Biotech), and incubated for 16h with 5l Cr-loaded Jurkat cells at an effector- 
to-target ratio of 1:1 in the presence of DcR3-Fc, Fas-Fc or human IgGl. 
Target-cell death was determined by release of 51 Cr in effector-target co- 
cultures relative to release of sl Cr by detergent lysis of equal numbers of Jurkat 
cells. 

Gene-amplification analysis. Surgical specimens were provided by J. Kern 
(lung tumours) and P. Quirke (colon tumours). Genomic DNA was extracted 
(Qiagen) and the concentration was determined using Hoechst dye 33258 
intercalation fluorometry. Amplification was determined by quantitative PCR" 
using a TaqMan instrument (ABI). The method was validated by comparison of 
PCR and Southern hybridization data for the Myc and HER-2 oncogenes (data 
not shown). Gene-specific primers and fluorogenic probes were designed on 
the basis of the sequence of DcR3 or of nearby regions identified on a BAC 
carrying the human DcR3 gene; alternatively, primers and probes were based 
on Stanford Human Genome Center marker AFM218xe7 (T160), which is 
linked to DcR3 (likelihood score = 5.4), SHGC-36268 (T159), the nearest 
available marker which maps to —500 kilobases from T160, and five extra 
markers that span chromosome 20. The DcR3-specific primer sequences were 
5'-CTTCTTCGCGCACGCTG-3' and 5'-ATCACGCCGGCACCAG-3' and the 
fluorogenic probe sequence was 5' - ( FAM - ACACG ATGCGTGCTCCAAGCAG 
AAp-(TAMARA), where FAM is 5'-fluorescein phosphoramidite. Relative 
gene-copy numbers were derived using the formula 2 (ACT) . where ACT is the 
difference in amplification cycles required to detect DcR3 in peripheral blood 
lymphocyte DNA compared to test DNA. 
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ABC transporters (also known as traffic ATPases) form a large 
family of proteins responsible for the translocation of a variety 
of compounds across membranes of both prokaryotes and 
eukaryotes 1 . The recently completed Escherichia coli genome 
sequence revealed that the largest family of paralogous E. coli 
proteins is composed of ABC transporters 2 . Many eukaryotic 
proteins of medical significance belong to this family, such as 
the cystic fibrosis transmembrane conductance regulator (CFTR), 
the P-glycoprotein (or multidrug-resistance protein) and the 
heterodimeric transporter associated with antigen processing 
(Tapl-Tap2). Here we report the crystal structure at 1.5 A resolu- 
tion of HisP, the ATP-binding subunit of the histidine permease, 
which is an ABC transporter from Salmonella typhimurium. We 
correlate the details of this structure with the biochemical, genetic 
and biophysical properties of the wild-type and several mutant 
HisP proteins. The structure provides a basis for understanding 
properties of ABC transporters and of defective CFTR proteins. 

ABC transporters contain four structural domains: two nucleo- 
tide-binding domains (NBDs), which are highly conserved 
throughout the family, and two transmembrane domains'. In 
prokaryotes these domains are often separate subunits which are 
assembled into a membrane-bound complex; in eukaryotes the 
domains are generally fused into a single polypeptide chain. The 
periplasmic histidine permease of S. typhimurium and E. coli' J ~" is a 
well-characterized ABC transporter that is a good model for this 
superfamily. It consists of a membrane-bound complex, HisQMP 2 , 
which comprises integral membrane subunits, HisQ and HisM, and 
two copies of HisP, the ATP-binding subunit. HisP, which has 
properties intermediate between those of integral and peripheral 
membrane proteins', is accessible from both sides of the membrane, 
presumably by its interaction with HisQ and HisM fi . The two HisP 
subunits form a dimer, as shown by their cooperativity in ATP 
hydrolysis 5 , the requirement for both subunits to be present for 
activity", and the formation of a HisP dimer upon chemical cross- 
linking. Soluble HisP also forms a dimer 3 . HisP has been purified 
and characterized in an active soluble form 3 which can be recon- 
stituted into a fully active membrane-bound complex 8 . 

The overall shape of the crystal structure of the HisP monomer is 
that of an 'L' with two thick arms (arm I and arm II); the ATP- 
binding pocket is near the end of arm I (Fig. 1). A six-stranded 13- 
sheet (p3 and (38-P 12) spans both arms of the L, with a domain of a 
ot- plus P-type structure (pi, P2, P4-P7, al and a2) on one side 
(within arm I) and a domain of mostly a-helices (a3-a9) on the 




Figure 1 Crystal structure of HisP. a, View of the dimer along an axis 
perpendicular to its two-fold axis. The top and bottom of the dimer are suggested 
to face towards the periplasmic and cytoplasmic sides, respectively (see text). 
The thickness of arm II is about 25 A, comparable to that of membrane. a-Helices 
are shown in orange and p-sheets in green, b, View along the two-fold axis of the 
HisP dimer, showing the relative displacement of the monomers not apparent in 
a. The p-strands at the dimer interface are labelled, c. View of one monomer from 
the bottom , of arm I, as shown in a. towards arm II, showing the ATP-binding 
pocket, a-c. The protein and the bound ATP are in 'ribbon' and 'ball-and-stick' 
representations, respectively. Key residues discussed in the text are indicated in 
c. These figures were prepared with MOLSCRIPT 23 . N, amino terminus; C. C 
terminus. 
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Gene amplification is a common event in the progression of 
human cancers, and amplified oncogenes have been shown to 
have diagnostic, prognostic and therapeutic relevance. A 
kinetic quantitative polymerase-chain-reaction (PCR) method, 
based on fluorescent TaqMan methodology and a new instru- 
ment (ABI Prism 7700 Sequence Detection System) capable 
of measuring fluorescence in real-time, was used to quantify 
gene amplification in tumor DNA. Reactions are character- 
ized by the point during cycling when PCR amplification is still 
in the exponential phase, rather than the amount of PCR 
product accumulated after a fixed number of cycles. None of 
the reaction components is limited during the exponential 
phase, meaning that values are highly reproducible in reac- 
tions starting with the same copy number. This greatly 
improves the precision of DNA quantification. Moreover, 
real-time PCR does not require post-PCR sample handling, 
thereby preventing potential PCR-product carry-over con- 
tamination; it possesses a wide dynamic range of quantifica- 
tion and results in much faster and higher sample throughput. 
The real-time PCR method, was used to develop and validate 
a simple and rapid assay for the detection and quantification 
of the 3 most frequently amplified genes (myc, ccndl and 
erbB2) in breast tumors. Extra copies of myc, ccndl and erbB2 
were observed in 10, 23 and 15%, respectively, of 108 breast- 
tumor DNA; the largest observed numbers of gene copies 
were 4.6, 18.6 and 15.1, respectively. These results correlated 
well with those of Southern blotting. The use of this new 
semi-automated technique will make molecular analysis of 
human cancers simpler and more reliable, and should find 
broad applications in clinical and research settings. Int. J. 
Cancer 78:661 -666, 1 998. 
© 1998 Wiley-Liss. Inc. 

Gene amplification plays an important role in the pathogenesis 
of various solid rumors, including breast cancer, probably because 
over-expression of the amplified target genes confers a selective 
advantage. The first technique used to detect genomic amplification 
was cytogenetic analysis. Amplification of several chromosome 
regions, visualized either as extrachromosomal double minutes 
(dmins) or as integrated homogeneously staining regions (HSRs), 
are among the main visible cytogenetic abnormalities in breast 
tumors. Other techniques such as comparative genomic hybridiza- 
tion (CGH) (Kallioniemi et al.. 1 994) have also been used in broad 
searches for regions of increased DNA copy numbers in tumor 
cells, and have revealed some 20 amplified chromosome regions in 
breast tumors. Positional cloning efforts are underway to identify 
the critical gene(s) in each amplified region. To date, genes known 
to be amplified frequently in breast cancers include myc (8q24), 
ccndl ( 1 1 q 1 3), and erbB2 ( 1 7q 1 2-q2 1 ) (for review, see Bieche and 
Lidereau, 1995). 

Amplification of the myc. ccndl, and erbB2 proto-oncogenes 
should have clinical relevance in breast cancer, since independent 
studies have shown that these alterations can be used to identify 
sub-populations with a worse prognosis (Bems el al., 1992; 
Schuuring et al.. 1992; Slamon et al.. 1987). Muss et al, (1994) 
suggested that these gene alterations may also be useful for the 
prediction and assessment of the efficacy of adjuvant chemotherapy 
and hormone therapy. 

However, published results diverge both in terms of the fre- 
quency of these alterations and their clinical value. For instance, 
over 500 studies in 10 years have failed to resolve the controversy 



surrounding the link suggested by Slamon et al. (1987) between 
erbBl amplification and disease progression. These discrepancies 
are partly due to the clinical, histological and ethnic heterogeneity 
of breast cancer, but technical considerations are also probably 
involved. 

Specific genes (DNA) were initially quantified in tumor cells by 
means of blotting procedures such as Southern and slot blotting. 
These batch techniques require large amounts of DNA (5-10 
ug/reaction) to yield reliable quantitative results. Furthermore, 
meticulous care is required at all stages of the procedures to 
generate blots of sufficient quality for reliable dosage analysis. 
Recently, PCR has proven to be a powerful tool for quantitative 
DNA analysis, especially with minimal starting quantities of tumor 
samples (small, early-stage rumors and formalin-fixed, paraffin- 
embedded tissues). 

Quantitative PCR can be performed by evaluating the amount of 
product either after a given number of cycles (end-point quantita- 
tive PCR) or after a varying number of cycles during the 
exponential phase (kinetic quantitative PCR). In the first case, an 
internal standard distinct from the target molecule is required to 
ascertain PCR efficiency. The method is relatively easy but implies 
generating, quantifying and storing an internal standard for each 
gene studied. Nevertheless, it is the most frequently applied 
method to date. 

One of the major advantages of the kinetic method is its rapidity 
in quantifying a new gene, since no internal standard is required (an 
external standard curve is sufficient). Moreover, the kinetic method 
has a wide dynamic range (at least 5 orders of magnitude), giving 
an accurate value for samples differing in their copy number. 
Unfortunately, the method is cumbersome and has therefore been 
rarely used. It involves aliquot sampling of each assay mix at 
regular intervals and quantifying, for each aliquot, the amplifica- 
tion product. Interest in the kinetic method has been stimulated by a 
novel approach using fluorescent TaqMan methodology and a new 
instrument (ABI Prism 7700 Sequence Detection System) capable 
of measuring fluorescence in real time (Gibson ei al.. 1 996; Heid et 
al., 1996). The TaqMan reaction is based on the 5' nuclease assay 
first described by Holland et al. (1991). The latter uses the 5' 
nuclease activity of Taq polymerase to cleave a specific fluorogenic 
oligonucleotide probe during the extension phase of PCR. The 
approach uses dual-labeled fluorogenic hybridization probes (Lee 
el al., 1993). One fluorescent dye, co-valently linked to the 5' end 
of the oligonucleotide, serves as a reporter [FAM (i.e., 6-carboxy- 
fluorescein)] and its emission spectrum is quenched by a second 
fluorescent dye, TAMRA (i.e., 6-carboxy-tetramethyl-rhodamine) 
attached to the 3' end. During the extension phase of the PCR 
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cycle, the fluorescent hybridization probe is hydrolyzed by the 
5'-3' nucleolytic activity of DNA polymerase. Nuclease degrada- 
tion of the probe releases the quenching of FAM fluorescence 
emission, resulting in an increase in peak fluorescence emission. 
The fluorescence signal is normalized by dividing the emission 
intensity of the reporter dye (FAM) by the emission intensity of a 
reference dye (i.e., ROX, 6-carboxy-X-rhodamine) included in 
TaqMan buffer, to obtain a ratio defined as the Rn (normalized 
reporter) for a given reaction tube. The use of a sequence detector 
enables the fluorescence spectra of all 96 wells of the thermal 
cycler to be measured continuously during PCR amplification. 

The real-time PCR method offers several advantages over other 
current quantitative PCR methods (Celi et ai, 1994): (i) the 
probe-based homogeneous assay provides a real-time method for 
detecting only specific amplification products, since specific hybri- 
dation of both the primers and the probe is necessary to generate a 
signal; (ii) the C, (threshold cycle) value used for quantification is 
measured when PCR amplification is still in the log phase of PCR 
product accumulation. This is the main reason why C, is a more 
reliable measure of the starting copy number than are end-point 
measurements, in which a slight difference in a limiting component 
can have a drastic effect on the amount of product; (Hi) use of C, 
values gives a wider dynamic range (at least 5 orders of magni- 
tude), reducing the need for serial dilution; (iv) The real-time PCR 
method is run in a closed-tube system and requires no post-PCR 
sample handling, thus avoiding potential contamination; (v) the 
system is highly automated, since the instrument continuously 
measures fluorescence in all 96 wells of the thermal cycler during 
PCR amplification and the corresponding software processes, and 
analyzes the fluorescence data; (vij the assay is rapid, as results are 
available just one minute after thermal cycling is complete; (vii) the 
sample throughput of the method is high, since 96 reactions can be 
analyzed in 2 hr. 

Here, we applied this semi-automated procedure to determine 
the copy numbers of the 3 most frequently amplified genes in breast 
tumors (myc, ccndl and eW>B2), as well as 2 genes (alb and app) 
located in a chromosome region in which no genetic changes have 
been observed in breast tumors. The results for 108 breast rumors 
were compared with previous Southern-blot data for the same 
samples. 

MATERIAL AND METHODS 
Tumor and blood samples 

Samples were obtained from 108 primary breast tumors removed 
surgically from patients at the Centre Rene Huguenin; none of the 
patients had undergone radiotherapy or chemotherapy. Immedi- 
ately after surgery, the rumor samples were placed in liquid 
nitrogen until extraction of high-molecular-weight DNA. Patients 
were included in this study if the tumor sample used for DNA 
preparation contained more than 60% of tumor cells (histological 
analysis). A blood sample was also taken from 18 of the same 
patients. 

DNA was extracted from tumor tissue and blood leukocytes 
according to standard methods. 

Real-time PCR 

Theoretical basis. Reactions are characterized by the point 
during cycling when amplification of the PCR product is first 
detected, rather than by the amount of PCR product accumulated 
after a fixed number of cycles. The higher the starting copy number 
of the genomic DNA target, the earlier a significant increase in 
fluorescence is observed. The parameter C, (threshold cycle) is 
defined as the fractional cycle number at which the fluorescence 
generated by cleavage of the probe passes a fixed threshold above 
baseline. The target gene copy number in unknown samples is 
quantified by measuring C, and by using a standard curve to 
determine the starting copy number. The precise amount of 
genomic DNA (based on optical density) and its quality (i.e., lack 



of extensive degradation) are both difficult to assess. We therefore 
also quantified a control gene (alb) mapping to chromosome region 
4qll-ql3. in which no genetic alterations have been found in 
breast-tumor DNA by means of CGH (Kallioniemi et ai, 1 994). 

Thus, the ratio of the copy number of the target gene to the copy 
number of the alb gene normalizes the amount and quality of 
genomic DNA. The ratio defining the level of amplification is 
termed "N", and is determined as follows: 

copy number of target gene (app. myc, ccndl, erbBl) 

N = ; . 

copy number of reference gene (alb) 

Primers, probes, reference human genomic DNA and PCR 
consumables. Primers and probes were chosen with the assistance 
of the computer programs Oligo 4.0 (National Biosciences, Ply- 
mouth, MN), EuGene (Daniben Systems, Cincinnati, OH) and Primer 
Express (Perkin-Elmer Applied Biosystems, Foster City, CA). 

Primers were purchased from DNAgency (Malvern, PA) and 
probes from Perkin-Elmer Applied Biosystems. 

Nucleotide sequences for the oligonucleotide hybridization 
probes and primers are available on request. 

The TaqMan PCR Core reagent kit, MicroAmp optical tubes, 
and MicroAmp caps were from Perkin-Elmer Applied Biosystems. 

Standard-curve construction. The kinetic method requires a 
standard curve. The latter was constructed with serial dilutions of 
specific PCR products, according to Piatak et al. (1993). In 
practice, each specific PCR product was obtained by amplifying 20 
rig of a standard human genomic DNA (Boehringer, Mannheim, 
Germany) with the same primer pairs as those used later for 
real-time quantitative PCR. The 5 PCR products were purified 
using MicroSpin S-400 HR columns (Pharmacia, Uppsala, Swe- 
den) electrophorezed through an acrylamide gel and stained with 
ethidium bromide to check their quality. The PCR products were 
then quantified spectrophotometrically and pooled, and serially 
diluted 1 0-fold in mouse genomic DNA (Clontech, Palo Alto, CA) 
at a constant concentration of 2 ng/u.1. The standard curve used for 
real-time quantitative PCR was based on serial dilutions of the pool 
of PCR products ranging from I0" 7 (10 s copies of each gene) to 
10" 10 (10* copies). This series of diluted PCR products was 
aliquoted and stored at -80°C until use. 

The standard curve was validated by analyzing 2 known 
quantities of calibrator human genomic DNA (20 ng and 50 hg). 

PCR amplification. Amplification mixes (50 p.1) contained the 
sample DNA (around 20 ng, around 6600 copies of disomic genes), 
10X TaqMan buffer (5 ul), 200 uM dATP, dCTP, dGTP, and 400 
uM dUTP, 5 mM MgCI 2 , 1.25 units of AmpliTaq Gold, 0.5 units of 
AmpErase uracil N-glycosylase (UNG), 200 nM each primer and 
100 nM probe. The thermal cycling conditions comprised 2 min at 
50°C and 1 0 min at 95°C. Thermal cycling consisted of 40 cycles at 
95°C for 15 s and 65°C for 1 min. Each assay included: a standard 
curve (from 10 5 to 10 2 copies) in duplicate, a no-template control, 
20 ng and 50 ng of calibrator human genomic DNA (Boehringer) in 
triplicate, and about 20 ng of unknown genomic DNA in triplicate 
(26 samples can thus be analyzed on a 96-well microplate). All 
samples with a coefficient of variation (CV) higher than 10% were 
retested. 

All reactions were performed in the ABI Prism 7700 Sequence 
Detection System (Perkin-Elmer Applied Biosystems), which 
detects the signal from the fluorogenic probe during PCR. 

Equipment for real-lime detection. The 7700 system has a 
built-in thermal cycler and a laser directed via fiber optical cables 
to each of the 96 sample wells. A charge-coupled-device (CDD) 
camera collects the emission from each sample and the data are 
analyzed automatically. The software accompanying the 7700 
system calculates C, and determines the starting copy number in the 
samples. 
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Determination of gene amplification. Gene amplification was 
calculated as described above. Only samples with an N value 
higher than 2 were considered to be amplified. 

RESULTS 

To validate the method, real-time PCR was performed on 
genomic DNA extracted from 108 primary breast tumors, and 18 
normal leukocyte DNA samples from some of the same patients. 
The target genes were the myc, ccndl and erbB2 proto-oncogenes, 
and the pl-amyloid precursor protein gene (app), which maps to a 
chromosome region (21q21.2) in which no genetic alterations have 
been found in breast tumors (Kallioniemi et al, 1994). The 
reference disomic gene was the albumin gene (alb, chromosome 
4qll-ql3). 



Validation of the standard curve and dynamic range 
of real-time PCR 

The standard curve was constructed from PCR products serially 
diluted in genomic mouse DNA at a constant concentration of 
2 ng/ul. It should be noted that the 5 primer pairs chosen to analyze 
the 5 target genes do not amplify genomic mouse DNA (data not 
shown). Figure 1 shows the real-time PCR standard curve for the 
alb gene. The dynamic range was wide (at least 4 orders of 
magnitude), with samples containing as few as 10 2 copies or as 
many as 10 5 copies. 

Copy-number ratio of the 2 reference genes ('app and z\b) 

The app to alb copy-number ratio was determined in 1 8 normal 
leukocyte DNA samples and all 108 primary breast-tumor DNA 
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Figure 1 - Albumin (alb) gene dosage by real-time PCR. Top: Amplification plots for reactions with starting alb gene copy number ranging 
from 10 5 (A9), 10" (A7), 10 3 (A4) to 10 2 (A2) and a no-template control (Al). Cycle number is plotted vs. change in normalized reporter signal 
(ARn). For each reaction tube, the fluorescence signal of the reporter dye (FAM) is divided by the fluorescence signal of the passive reference dye 
(ROX), to obtain a ratio defined as the normalized reporter signal (Rn). ARn represents the normalized reporter signal (Rn) minus the baseline 
signal established in the first 15 PCR cycles. ARn increases during PCR as alb PCR product copy number increases until the reaction reaches a 
plateau. C, (threshold cycle) represents the fractional cycle number at which a significant increase in Rn above a baseline signal (horizontal black 
line) can first be detected. Two replicate plots were performed for each standard sample, but the data for only one are shown here. Bottom: 
Standard curve plotting log starting copy number vs. C, (threshold cycle). The black dots represent the data for standard samples plotted in 
duplicate and the red dots the data for unknown genomic DNA samples plotted in triplicate. The standard curve shows 4 orders of linear dynamic 
range. 
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samples. We selected these 2 genes because they are located in 2 
chromosome regions (app. 21q21.2; alb, 4qll-ql3) in which no 
obvious genetic changes (including gains or losses) have been 
observed in breast cancers (Kallioniemi et al., 1994). The ratio for 
the 18 normal leukocyte DNA samples fell between 0.7 and 1.3 
(mean 1.02 ± 0.21), and was similar for the 108 primary breast- 
tumor DNA samples (0.6 to 1.6, mean 1.06 ± 0.25), confirming 
that alb and app are appropriate reference disomic genes for 
breast-tumor DNA. The low range of the ratios also confirmed that 
the nucleotide sequences chosen for the primers and probes were 
not polymorphic, as mismatches of their primers or probes with the 
subject's DNA would have resulted in differential amplification. 

myc, ccnd 1 and erbS2 gene dose in normal leukocyte DNA 

To determine the cut-off point for gene amplification in breast- 
cancer tissue, 18 normal leukocyte DNA samples were tested for 
the gene dose (N), calculated as described in "Material and 
Methods". The N value of these samples ranged from 0.5 to 1.3 
(mean 0.84 ± 0.22) for mvc, 0.7 to 1.6 (mean 1.06 ± 0.23) for 
ccndl and 0.6 to 1.3 (mean 0.91 ±0.19) forer£B2. Since N values 
for myc, ccnd] and erbBl in normal leukocyte DNA consistently 
fell between 0.5 and 1.6, values of 2 or more were considered to 
represent gene amplification in tumor DNA. 

myc, ccndl and exbB2 gene dose in breast-tumor DNA 

myc, ccndl and erbBl gene copy numbers in the 108 primary 
breast tumors are reported in Table I. Extra copies of ccndl were 
more frequent (23%, 25/108) than extra copies of erbBl (15%, 
16/108) and myc (10%, 11/108), and ranged from 2 to 18.6 for 
ccndl, 2 to 15.1 for erbBl, and only 2 to 4.6 for the myc gene. 
Figure 2 and Table II represent tumors in which the ccndl gene was 
amplified 16-fold (T145), 6-fold (Tl 33) and non-amplified (Tl 1 8). 
The 3 genes were never found to be co-amplified in the same tumor. 
erbBl and ccndl were co-amplified in only 3 cases, myc and ccndl 
in 2 cases and myc and erbBl in 1 case. This favors the hypothesis 
that gene amplifications are independent events in breast cancer. 
Interestingly, 5 tumors showed a decrease of at least 50% in the 
erbBl copy number (N < 0.5), suggesting that they bore deletions 
of the 17q21 region (the site of erbBl). No such decrease in copy 
number was observed with the other 2 proto-oncogenes. 

. Comparison of gene dose determined by real-time quantitative 
PCR and Southern-blot analysis 

Southem-blot analysis of myc, ccndl and erbBl amplifications 
had previously been done on the same 108 primary breast tumors. A 
perfect correlation between the results of real-time PCR and 
Southern blot was obtained for tumors with high copy numbers 
(N £ 5). However, there were cases (1 myc, 6 ccndl and 4 erbBl) 
in which real-time PCR showed gene amplification whereas 
Southem-blot did not, but these were mainly cases with low extra 
copy numbers (N from 2 to 2.9). 

DISCUSSION 

The clinical applications of gene amplification assays are 
currently limited, but would certainly increase if a simple, standard- 
ized and rapid method were perfected. Gene amplification status 
has been studied mainly by means of Southern blotting, but this 
method is not sensitive enough to detect low-level gene amplifica- 
tion nor accurate enough to quantify the full range of amplification 
values. Southern blotting is also time-consuming, uses radioactive 



TABLE I - DISTRIBUTION OF AMPLIFICATION LEVEL (N) FOR myc. 
ccndl AND erbm GENES IN 108 HUMAN BREAST TUMORS 



Gene 




Amplification level (N) 




<0.5 


0.5-1.9 


2-4.9 


25 


myc 

ccndl 

erbBl 


0 
0 

5 (4.6%) 


97 (89.8%) 
83 (76.9%) 
87 (80.6%) 


II (10.2%) 
17(15.7%) 
8 (7.4%) 


0 

8 (7.4%) 
8 (7.4%) 



reagents and requires relatively large amounts of high-quality 
genomic DNA, which means it cannot be used routinely in many 
laboratories. An amplification step is therefore required to deter- 
mine the copy number of a given target gene from minimal 
quantities of tumor DNA (small early-stage tumors, cytopuncture 
specimens or formalin-fixed, paraffin-embedded tissues). 

In this study, we validated a PCR method developed for the 
quantification of gene over-representation in rumors. The method, 
based on real-time analysis of PCR amplification, has several 
advantages over other PCR-based quantitative assays such as 
competitive quantitative PCR(Celi et al., 1994). First, the real-time 
PCR method is performed in a closed-tube system, avoiding the 
risk of contamination by amplified products. Re-amplification of 
carryover PCR products in subsequent experiments can also be 
prevented by using the enzyme uracil N-glycosylase (UNG) 
(Longo et a!., 1990). The second advantage is the simplicity and 
rapidity of sample analysis, since no posl-PCR manipulations are 
required. Our results show that the automated method is reliable. 
We found it possible to determine, in triplicate, the number of 
copies of a target gene in more than 1 00 tumors per day. Third, the 
system has a linear dynamic range of at least 4 orders of magnitude, 
meaning that samples do not have to contain equal starting amounts 
of DNA. This technique should therefore be suitable for analyzing 
formalin-fixed, paraffin-embedded tissues. Fourth, and above all, 
real-time PCR makes DNA quantification much more precise and 
reproducible, since it is based on C, values rather than end-point 
measurement of the amount of accumulated PCR product. Indeed, 
the ABI Prism 7700 Sequence Detection System enables Q to be 
calculated when PCR amplification is still in the exponential phase 
and when none of the reaction components is rate-limiting. The 
within-run CV of the C, value for calibrator human DNA (5 
replicates) was always below 5%, and the between-assay precision 
in 5 different runs was always below 10% (data not shown). In 
addition, the use of a standard curve is not absolutely necessary, 
since the copy number can be determined simply by comparing the 
C, ratio of the target gene with that of reference genes. The results 
obtained by the 2 methods (with and without a standard curve) are 
similar in our experiments (data not shown). Moreover, unlike 
competitive quantitative PCR, real-time PCR does not require an 
internal control (the design and storage of internal controls and the 
validation of their amplification efficiency is laborious). 

The only potential disavantage of real-time PCR, like all other 
PCR-based methods and solid-matrix blotting techniques (South- 
ern blots and dot blots) is that is cannot avoid dilution artifacts 
inherent in the extraction of DNA from tumor cells contained in 
heterogeneous tissue specimens. Only FISH and immunohistochem- 
istry can measure alterations on a cell-by-cell basis (Pauletti et al., 
1996; Slamon et al., 1989). However, FISH requires expensive 
equipment and trained personnel and is also time-consuming. 
Moreover, FISH does not assess gene expression and therefore 
cannot detect cases in which the gene product is over-expressed in 
the absence of gene amplification, which will be possible in the 
future by real-time quantitative RT-PCR. Immunohistochemistry is 
subject to considerable variations in the hands of different teams, 
owing to alterations of target proteins during the procedure, the 
different primary antibodies and fixation methods used and the 
criteria used to define positive staining. 

The results of this study are in agreement with those reported in 
the literature, (f) Chromosome regions 4qll-ql3 and 2 1 q2 1 .2 
(which bear alb and app, respectively) showed no genetic alter- 
ations in the breast-cancer samples studied here, in keeping with 
the results of CGH (Kallioniemi et al, 1994). (//) We found that 
amplifications of these 3 oncogenes were independent events, as 
reported by other teams (Bems et al., 1992; Borg el al, 1992). (Hi) 
The frequency and degree of myc amplification in our breast tumor 
DNA series were lower than those of ccndl and erbBl amplifica- 
tion, confirming the findings of Borg et al. ( 1 992) and Courjal el al. 
(1997). (iV) The maxima of ccndl and eW>B2 over-representation 
were 1 8-fold and 1 5-fold, also in keeping with earlier results (about 
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26.5 
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Ficurb 2 - ccndl and alb gene dosage by real-time PCR in 3 breast tumor samples: Til 8 (El 2, C6, black squares), T133 (Gl l. B4, red squares) 
and T145 (A8, C8, blue squares). Given the C, of each sample, the initial copy number is inferred from the standard curve obtained during the same 
experiment. Triplicate plots were performed for each tumor sample, but the data for only one are shown here. The results are shown in Table II. 



30-fold maximum) (Berns et al, 1992; Borgela/.. 1992; Courjal et 
al, 1997). (v) The erAB2 copy numbers obtained with real-time 
PCR were in good agreement with data obtained with other 
quantitative PCR-based assays in terms of the frequency and 
degree of amplification (An etai, 1995; Deng et al, 1996;Valeron 



el al, 1996). Our results also correlate well with those recently 
published by Gelmini et al. ( 1 997), who used the TaqMan system to 
measure erbB2 amplification in a small series of breast tumors 
(n = 25), but with an instrument (LS-50B luminescence spectrom- 
eter, Perkin-Elmer Applied Biosystems) which only allows end- 
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TABLE II - EXAMPLES OF ccndl GENE DOSAGE RESULTS 
FROM 3 BREAST TUMORS' 



Tumor 




ccndl 






alb 




Nccndl/alb 


Copy 
number 


Mean 


SD 


Copy 
number 


Mean 


SD 


Til 8 


4525 






4223 










4605 


4603 


77 


4365 


4325 


89 


1.06 




4678 






4387 








T133 


59821 






9787 










61659 


61100 


1111 


10092 


10137 


375 


6.03 




61821 






10533 








T145 


128563 






7321 










125892 


125392 


3448 


7762 


7672 


316 


16.34 




121722 






7933 









' For each sample, 3 replicate experiments were performed and the mean 
and the standard deviation (SD) was determined. The level of ccndl gene 
amplification (Nccndl/alb) is determined by dividing the average ccndl 
copy namber value by the average alb copy number value. 



point measurement of fluorescence intensity. Here we report myc 
and ccndl gene dosage in breast cancer by means of quantitative 
PCR. (vi) We found a high degree of concordance between 
real-time quantitative PCR and Southern blot analysis in terms of 
gene amplification, especially for samples with high copy numbers 
(£5-fold). The slightly higher frequency of gene amplification 
(especially ccndl and eri>B2) observed by means of real-time 
quantitative PCR as compared with Southem-blot analysis may be 
explained by the higher sensitivity of the former method. However, 
we cannot rule out the possibility that some tumors with a few extra 



gene copies observed in real-time PCR had additional copies of an 
arm or a whole chromosome (trisomy, tetrasomy or polysomy) 
rather than true gene amplification. These 2 types of genetic 
alteration (polysomy and gene amplification) could be easily 
distinguished in the future by using an additional probe located on 
the same chromosome arm, but some distance from the target gene. 
It is noteworthy that high gene copy numbers have the greatest 
prognostic significance in breast carcinoma (Borg et ai, 1992; 
Slamon tv a/., 1987). 

Finally, this technique can be applied to the detection of gene 
deletion as well as gene amplification. Indeed, we found a 
decreased copy number of erAB2 (but not of the other 2 proto- 
oncogenes) in several tumors; eriB2 is located in a chromosome 
region (17q21) reported to contain both deletions and amplifica- 
tions in breast cancer (Bieche and Lidereau, 1995). 

In conclusion, gene amplification in various cancers can be used 
as a marker of pre-neoplasia, also for early diagnosis of cancer, 
staging, prognostication and choice of treatment. Southern blotting 
is not sufficiently sensitive, and FISH is lengthy and complex. 
Real-time quantitative PCR overcomes both these limitations, and 
is a sensitive and accurate method of analyzing large numbers of 
samples in a short time. It should find a place in routine clinical 
gene dosage. 
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Genome-wide Study of Gene Copy Numbers, 
Transcripts, and Protein Levels in Pairs of 
Non-invasive and Invasive Human Transitional 
Cell Carcinomas* 

Torben F. 0rntoft*§, Thomas ThykjaerU Frederic M. Waldmarifl, Hans Wolf**, 
and Julio E. Celistt 



Gain and loss of chromosomal material is characteristic 
of bladder cancer, as well as malignant transformation In 
general. The consequences of these changes at both the 
transcription and translation levels is at present unknown 
partly because of technical limitations. Here we have at- 
tempted to address this question in pairs of non-invasive 
and invasive human bladder tumors using a combination 
of technology that included comparative genomic hybrid- 
ization, high density oligonucleotide array-based monitor- 
ing of transcript levels (5600 genes), and high resolution 



phenomenon at both the transcription and translation levels. 
High throughput array studies of the breast cancer cell line 
BT474 has suggested that there is a correlation between 
DNA copy numbers and gene expression in highly amplified 
areas (2), and studies of individual genes in solid tumors 
have revealed a good correlation between gene dose and 
mRNA or protein levels in the case of c-erb-B2, cyc//n dl, 
ems1, and N-myc (3-5). However, a high cyclin D1 protein 
expression has been observed without simultaneous ant- 



ing of transcript levels (5600 genes), and h.gn resolution q cop number |n 

two-dimensional gel electrophoresis/the results showed^ . without concom y itant c . myc pro tei, 



that there is a gene dosage effect that in some cases' 
superimposes on other regulatory mechanisms. This ef- 
fect depended (p < 0.015) on the magnitude of the com- 
parative genomic hybridization change. In general (18 of 
23 cases), chromosomal areas with more than 2-fold gain 
of DNA showed a corresponding increase in mRNA tran- 
scripts. Areas with loss of DNA, on the other hand, 
showed either reduced or unaltered transcript levels) Be- 
cause most proteins resolved by two-dimensional gels 
are unknown it was only possible to compare mRNA and 
protein alterations inrelatively few cases of well focused 
abundant proteins, ^ith few exceptions we found a good 
correlation (p < 0.005) between transcript alterations and 
protein levels. The implications, as well as limitations, 
Of the approach are discussed. Molecular & Cellular 
Proteomics 1:37-45, 2002. 



Aneuploidy is a common feature of most human cancers 
(1), but little is known about the genome-wide effect of this 
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crease was observed without concomitant c-myc protein 
overexpression (6). 

In human bladder tumors, karyotyping, fluorescent In situ 
hybridization, and comparative genomic hybridization (CGH) 1 
have revealed chromosomal aberrations that seem to be 
characteristic of certain stages of disease progression. In the 
case of non-invasive pTa transitional cell carcinomas (TCCs), 
this includes loss of chromosome 9 or parts of It, as well as 
loss of Y in males. In minimally invasive pT1 TCCs. the fol- 
lowing alterations have been reported: 2q-, 11p-, 1q+. 
11q13+. 17q+, and 20q+ (7-12). It has been suggested that 
these regions harbor tumor suppressor genes and onco- 
genes; however, the large chromosomal areas involved often 
contain many genes, making meaningful predictions of the 
functional consequences of losses and gains very difficult. 

In this Investigation we have combined genome-wide tech- 
nology for detecting genomic gains and losses (CGH) with 
gene expression profiling techniques (microarrays and pro- 
teomics) to determine the effect of gene copy number on 
transcript and protein levels in pairs of non-invasive and in- 
vasive human bladder TCCs. 

EXPERIMENTAL PROCEDURES 
Material- Bladder tumor biopsies were sampled after Informed 
consent was obtained and after removal of tissue for routine pathol- 
ogy examination. By light microscopy tumors 335 and 532 were 
staged by an experienced pathologist as pTa (superficial papillary), 



1 The abbreviations used are: CGH, comparative genomic hybrid- 
ization; TCC, transitional ceil carcinoma; LOH. loss of heterozygosity; 
PA-FABP, psoriasis-associated fatty acid-binding protein; 2D, 
two-dimensional. 
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grade I and II, respectively, tumors 733 and 827 were staged as pT1 
(invasive into submucosal. 733 was staged as solid, and 827 was 
staged as papillary, both grade III. 

mRNA Preparation -Tissue biopsies, obtained fresh from surgery, 
were embedded immediately In a sodium-guanidinlum thiocyanate 
solution and stored at -80 °C. Total RNA was isolated using the 
RNAzol B RNA Isolation method (WAK-Chemie Medical GMBH). 
poWi* RNA was isolated by an oligo(dT) selection step (Oligotex 
mRNA kit; Qiagen). 

cRNA Preparation- 1 jtg of mRNA was used as starting material. 
The first and second strand cDNA synthesis was performed using the 
Superscript® choice system (Invltrogen) according to the manufac- 
turer's instructions but using an oligo(dT) primer containing a T7 RNA 
polymerase binding site. Labeled cRNA was prepared using the ME- 
GAscrip® in vitro transcription kit (Ambion). Biotin-labeled CTP and 



LfTP (Enzo) was used, together with unlabeled NTPs In the reaction. 
Following the in vitro transcription reaction, the unincorporated nu- 
cleotides were removed using RNeasy columns (Qiagen). 

Array Hybridization and Scanning-Amy hybridization and scan- 
ning was modified from a previous method (13). 10 fig of cRNA was 
fragmented at 94 °C for 35 min in buffer containing 40 rrw Tris 
acetate, pH 8.1 , 100 imi KOAc, 30 nw MgOAc. Prior to hybridization, 
the fragmented cRNA in a 6X SSPE-T hybridization buffer (1 m NaCI. 
10 mM Tris, pH 7.6, 0.005% Triton), was heated to 95 "C for 5 min, 
subsequently cooled to 40 °C, and loaded onto the Affymetrix probe 
array cartridge. The probe array was then incubated for 16 hat 40 °C 
at constant rotation (60 rpm). The probe array was exposed to 10 
washes in 6x SSPE-T at 25 "C followed by 4 washes in 0.5X SSPE-T 
at 50 °C. The biotinylated cRNA was stained with a streptavidin- 
phycoerythrin conjugate, 10 ^g/ml (Molecular Probes) in 6x SSPE-T 
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Fig. 1— continued 



for 30 min at 25 °C followed by 1 0 washes in 6 x SSPE-T at 25 "C. The 
probe arrays were scanned at 560 hm using a confocal laser scanning 
microscope (made for Affymetrix by Hewlett-Packard). The readings 
from the quantitative scanning were analyzed by Affymetrix gene 
expression analysis software. 

Microsafetf/te Analysis— Microsatellite Analysis was performed as 
described previously (14). Microsatellites were selected by use of 
www.nobl.nlm.nih.gov/genemap98, and primer sequences were ob- 
tained from the genome data base at www.gdb.org. DNA was extracted 
from tumor and blood and ampBfied by PGR in a volume of 20 n) for 35 
cycles. The ampltcons were denatured and electrophoresed for 3 h in an 
ABI Prism 377. Data were collected in the Gene Scan program for 
fragment analysis. Loss of heterozygosity was defined as less than 33% 
of one allele detected in tumor ampl icons compared with blood. 

Pmteomic Analysis— TCCs were minced into small pieces and 
homogenized in a small glass homogenizer in 0.5 ml of lysis solution. 
Samples were stored at -20 "C until use. The procedure for 2D gel 
electrophoresis has been described in detail elsewhere (15. 16). Gels 
were stained with silver nitrate and/or Coomassie Brilliant Blue. Pro- 
teins were Identified by a combination of procedures that Included 
microsequencing. mass spectrometry, two-dimensional gel Western 
immunoblotting, and comparison with the master two-dimensional gel 
image of human keratinocyte proteins; see biobase.dk/cgi-bin/celis. 

CGH— Hybridization of differentially labeled tumor and normal DNA 
to normal metaphase chromosomes was performed as described 
previously (10). Ruorescein-labeled tumor DNA (200 ng), Texas Red- 



labeled reference DNA (200 ng). and human Cot-1 DNA (20 >*g) were 
denatured at 37 °C for 5 min and applied to denatured normal met- 
aphase slides. Hybridization was at 37 °C for 2 days. After washing, 
the slides were counterstained with 0.15 jiQ/ml 4,6-diamidino-2-phe- 
nylindole in an anti-fade solution. A second hybridization was per- 
formed for all tumor samples using fluoresceln-labeted reference DNA 
and Texas Red-labeled tumor DNA (Inverse labeling) to confirm the 
aberrations detected during the initial hybridization. Each CGH ex- 
periment also included a normal control hybridization using fluores- 
cein- and Texas Red-labeled normal DNA Digital image analysis was 
used to identify chromosomal regions with abnormal fluorescence 
ratios, indicating regions of DNA gains and losses. The average 
green red fluorescence intensity ratio profiles were calculated using 
lour images of each chromosome (eight chromosomes total) with 
normalization of the green:red fluorescence intensity ratio for the 
entire metaphase and background correction. Chromosome identifi- 
cation was performed based on 4,6-dlamidino-2-phenylindote band- 
ing patterns. Only images showing uniform high Intensity fluores- 
cence with minimal background staining were analyzed. All 
centromeres, p arms of acrocentric chromosomes, and heterochro- 
matic regions were excluded from the analysis. 

RESULTS 

Comparative Genomic Hybridization— The CGH analysis 
identified a number of chromosomal gains and losses in the 
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Tabu I 

Correlation between alterations detected by CGH and by expression monitoring 



Top, CGH used as independent variable (if CGH alteration - what expression ratio was found); bottom, altered expression 
independent variable (it expression alteration - what CGH deviation was found). . 





Tumor 733 vs. 335 


Concordance 


CGH alterations 


Tumor 827 vs. 532 


Concordance 


CGH alterations 


Expression change clusters 


Expression change clusters 


13 Gain 
10 Loss 


10 Up-regulation 

0 Down-regulation 

3 No change 

1 Up-regulation 

5 Down-regulation 

4 No change 


77% 
50% 


10 Gain 
12 Loss 


8 Up-regulation 
0 Down-regulation 

2 No change 

3 Up-regulation 

2 Down regulation 
7 No change 


80% 
17% 




Tumor 733 vs. 335 


Concordance 




Tumor 827 vs. 532 


Concordance 


Expression change clusters alteratlons 


express.. «w.. a = CGH alterations 


16 Up-regulation 


11 Gain 

2 Loss 

3 No change 
1 Gain 

8 Loss 

12 No change 
3 Gain 

3 Loss 

9 No change 


69% 


17 Up-regulation 


10 Gain 
5 Loss 


59% 


21 Down-regulation 
15 No change 


38% 
60% 


9 Down-regulation 

21 No change 

i 


2 No change 

0 Gain 

3 Loss 

6 No change 

1 Gain 
3 Loss 

17 No change 


33% 
81% 



than 2-fold were regarded as Informative (Fig. 1). The density 
of genes along the chromosomes varied, and areas contain- 
ing only one gene were excluded from the calculations. The 
resolution of the CGH method is very low, and some of the 
outlier data may be because of the fact that the boundaries of 
the chromosomal aberrations are not known at high resolution. 

Two sets of calculations were made from the data. For the 
first set we used CGH alterations as the independent variable 
and estimated the frequency of expression alterations in these 
chromosomal areas. In general, areas with a strong gain of 
chromosomal material contained a cluster of genes having 
increased mRNA expression. For example, both chromo- 
somes 1q21-q25, 2p and 9q. showed a relative gain of more 
than 100% in DNA copy number that was accompanied by 
increased mRNA expression levels in the two tumor pairs (Fig. 
1). In most cases, chromosomal gains detected by CGH were 
accompanied by an increased level of transcripts in both 
TCCs 733 (77%) and 827 (80%) (Table I, fop). Chromosomal 
losses, on the other hand, were not accompanied by de- 
creased expression in several cases, and were often regis- 
tered as having unaltered RNA levels (Table I, top). The inabil- 
ity to detect RNA expression changes In these cases was not 
because of fewer genes mapping to the lost regions (data not 
shown). 

In the second set of calculations we selected expression 
alterations above 2-fold as the independent variable and es- 
timated the frequency of CGH alterations in these areas. As 
above, we found that increased transcript expression corre- 
lated with gain of chromosomal material (TCC 733, 69% and 
TCC 827, 59%), whereas reduced expression was often de- 
tected in areas with unaltered CGH ratios (Table I, bottom). 
Furthermore, as a control we looked at areas with no alter- 



two invasive tumors (stage pT1, TCCs 733 and 827), whereas 
the two non-invasive papillomas (stage pTa, TCCs 335 and 
532) showed only 9p-, 9q22-q33-, and X-, and 7+, 9q-. 
and Y-, respectively. Both invasive tumors showed changes 
(1q22-24+, 2q14.1-qter-. 3q12-q13.3-, 6q12-q22-, 
9q34+. 11q12-q13+, 17+, and 20q11.2-q1 2+) that are typ- 
ical for their disease stage, as well as additional alterations, 
some of which are shown in Fig. 1. Areas with gains and 
losses deviated from the normal copy number to some extent, 
and the average numerical deviation from normal was 0.4-fold 
in the case of TCC 733 and 0.3-fold for TCC 827. The largest 
changes, amounting to at least a doubling of chromosomal 
content, were observed at 1q23 in TCC 733 (Fig. ^A) and 
20q1 2 in TCC 827 (Fig. 1 B). 

mRNA Expression in Relation to DNA Copy Number-The 
mRNA levels from the two invasive tumors (TCCs 827 and 
733) were compared with the two non-invasive counterparts 
(TCCs 532 and 335). This was done in two separate experi- 
ments in which we compared TCCs 733 to 335 and 827 to 
532, respectively, using two different scaling settings for the 
arrays to rule out scaling as a confounding parameter. Ap- 
proximately 1,800 genes that yielded a signal on the arrays 
were searched in the Unigene and Genemap data bases for 
chromosomal location, and those with a known location 
(1096) were plotted as bars covering their purported locus. In 
that way it was possible to construct a graphic presentation of 
DNA copy number and relative mRNA levels along the Indi- 
vidual chromosomes (Fig. 1). 

For each mRNA a ratio was calculated between the level in 
the invasive versus the non-invasive counterpart. Bars, which 
represent chromosomal location of a gene, were color-coded 
according to the expression ratio, and only differences larger 
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Pig 2 Correlation between maximum CGH aberration and the ability to detect expression change by oligonucleotide array 
monitoring. The aberration is shown as a numerical -fold change In ratio between invasive tumors 827 (A) and 733 (♦) andl th«r "orv-invasnje 
counterparts 532 and 335. The expression change was taken from the Expression line to the right in Fig. 1. which depicts the resulting 
expression change for a given chromosomal region. At least half of the mRNAs from a given region have to be either up- or ctown-regulated 
tobe scored as an expression change. All chromosomal arms in which the CGH ratio plus or minus one standard deviation was outside the 
ratio value of one were included. 



ation in expression. No alteration was detected by CGH in 
most of these areas (TCC 733, 60% and TCC 827, 81 %; see 
Table I, bottom). Because the ability to observe reduced or 
increased mRNA expression clustering to a certain chromo- 
somal area clearly reflected the extent of copy number 
changes, we plotted the maximum CGH aberrations in the 
regions showing CGH changes against the ability to detect a 
change in mRNA expression as monitored by the oligonucleo- 
tide arrays (Fig. 2)(jj>r both tumors TCC 733 (p < 0.015) and 
TCC 827 (p < 0.00003) a highly significant correlation was 
observed between the level of CGH ratio change (reflecting 
the DNA copy number) and alterations detected by the array 
based technology (Fig. 2j) Similar data were obtained when 
areas with altered expression were used as independent vari- 
ables. These areas correlated best with CGH when the CGH 
ratio deviated 1.6- to 2.0-fold (Table I, bottom) but mostly did 
not at lower CGH deviations. These data probably reflect that 
loss of an allele may only lead to a 50% reduction in expres- 
sion level, which is at the cut-off point for detection of expres- 
sion alterations. Gain of chromosomal material can occur to a 
much larger extent. 

Microsatellite-based Detection of Minor Areas of Loss- 
es- In TCC 733, several chromosomal areas exhibiting DNA 
amplification were preceded or followed by areas with a nor- 
mal CGH but reduced mRNA expression (see Fig. 1, TCC 733 
chromosome 1q32, 2p21. and 7q21 and q32, 9q34, and 
10q22). To determine whether these results were because of 
undetected loss of chromosomal material in these regions or 



because of other non-structural mechanisms regulating tran- 
scription, we examined two mlcrosatellites positioned at chro- 
mosome 1q25-32 and two at chromosome 2p22. Loss of 
heterozygosity (LOH) was found at both 1q25 and at 2p22 
indicating that minor deleted areas were not detected with the 
resolution of CGH (Fig. 3). Additionally, chromosome 2p in 
TCC 733 showed a CGH pattern of gain/no change/gain of 
DNA that correlated with transcript increase/decrease/in- 
crease. Thus, for the areas showing increased expression 
there was a correlation with the DNA copy number alterations 
. (Fig. 1 A). As indicated above, the mRNA decrease observed in 
the middle of the chromosomal gain was because of LOH, 
implying that one of the mechanisms for mRNA down-regu- 
lation may be regions that have undergone smaller losses of 
chromosomal material. However, this cannot be detected with 
the resolution of the CGH method. 

In both TCC 733 and TCC 827, the telomeric end of chro- 
mosome 11p showed a normal ratio in the CGH analysis; 
however, clusters of five and three genes, respectively, lost 
their expression. Two microsatellltes (D11S1760, D11S922) 
positioned close to MUC2, K3F2, and cathepsln D indicated 
LOH as the most likely mechanism behind the loss of expres- 
sion (data not shown). 

A reduced expression of mRNA observed in TCC 733 at 
chromosomes 3q24, 11p11, 12p12.2, 12q21.1, and 16q24 
and in TCC 827 at chromosome 11p15.5, 12p11, 15q11.2, 
and 18q12 was also examined for chromosomal losses using 
microsatellites positioned as close as possible to the gene loci 
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FK3. 3. Microsatellite analysis of loss of heterozygosity. Tumor 
733 showing loss of heterozygosity at chromosome 1q25. detected 
(a) by D1S215 close to Hu class I histocompatibility antigen (gene 
number 38 In Fig. 1), (fa) by D1S2735 dose to cathepsin E (gene 
number 41 in Fig. 1), and (c) at chromosome 2p23 by D2S2251 close 
to general 0-spectrin (gene number 1 1 on Fig. 1) and of (d) tumor 827 
showing loss of heterozygosity at chromosome 18q12 by S18S1118 
close to mitochondrial 3-oxoacyl-coenzyme A thiolase (gene number 
12 in Rg. 1). The upper curves show the electropherogram obtained 
from normal DNA from leukocytes (N), and the lower curves show the 
electropherogram from tumor DNA (J). In all cases one allele is 
partially lost in the tumor amplicon. 

showing reduced mRNA transcripts. Only the microsatellite 
positioned at 18q12 showed LOH (Fig. 3), suggesting that 
transcriptional down-regulation of genes in the other regions 
may be controlled by other mechanisms. 

Relation between Changes in mRNA and Protein Levels— 
2D-PAGE analysis, in combination with Coomassie Brilliant 
Blue and/or silver staining, was earned out on all four tumors 
using fresh biopsy material. 40 well resolved abundant known 
proteins migrating in areas away from the edges of the pH 
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Fig. 4. Correlation between protein levels as fudged by 20- 
PAGE and transcript ratio. For comparison proteins were divided In 
three groups, unaltered in level or up- or down-regulated ^horizontal 
axis), the mRNA ratio as determined by oligonucleotide arrays was 
plotted for each gene {vertical axis). A, mRNAs that were scored as 
present in both tumors used for the ratio calculation; A, mRNAs that 
were scored as absent in the invasive tumors (along horizontal axis) or 
as absent in non-Invasive reference (fop of figure). Two different 
scalings were used to exclude scaling as a confounder, TCCs 827 
and 532 (JlA) were scaled with background suppression, and TCCs 
733 and 335 (•O) were scaled without suppression. Both compari- 
sons showed highly significant (p < 0.005) differences in mRNA ratios 
between the groups. Proteins shown were as follows: Group A (from 
left), phosphoglucomutase 1, glutathione transferase class m number 
4, laity acid-binding protein homologue, cytokeratin 15, and cytc- 
keratin 1 3; 6 (from toft), fatty acid-binding protein homologue, 28-kDa 
heat shock protein, cytokeratin 1 3, and calcyclin; C (from left), or-eno- 
lase, hnRNP B1, 28-kDa heat shock protein, 14-3-3-t, and 
pre-mRNA splicing factor, O. mesothelial keratin K7 (type II); E (from 
fop), glutathione S-transferase-ir and mesothelial keratin K7 (type II); 
F(from top and left), adenylyl cyclase- associated protein, E-cadherin, 
keratin 19, catglzzarin, phosphoglycerate mutase, annexin IV, cy- 
toskeletal y-actin, hnRNP A1, integral membrane protein calnexin 
OP90), hnRNP H, brain-type clathrin light chain-a, hnRNP F, 70-kDa 
heat shock protein, heterogeneous nuclear ribonucleoprotein A/B, 
translationally controlled tumor protein, liver glyceraldehyde-3-phos- 
phate dehydrogenase, keratin 8, aldehyde reductase, and Na,K- 
ATPase 8-1 subunit; G, (from top and left), TCP20, calgizzarin, 70- 
kba heat shock protein, calnexin, hnRNP H, cytokeratin 15, ATP 
synthase, keratin 19, triosephosphate feomerase, hnRNP F, liver glyc- 
eraldehyde-3-phosphatase dehydrogenase, glutathione S-transfer- 
ase-w, and keratin 8; H (from left), plasma gelsolin, autoantigen cal- 
reticulin, thioredoxln, and NAD+-dependent 15 hydroxyproslaglandin 
dehydrogenase; / (from top), prolyl 4-hydroxylase 0-subunit, cyto- 
keratin 20, cytokeratin 17, prohibition, and fructose 1,6-biphos- 
phatase; J annexin II; K, annexin IV; L (from top and feft), 90-kDa heat 
shock protein, prolyl 4-hydroxylase 8-subunH, a-enolase, GRP 78, 
cyclophllin, and cofilih. 

gradient, and having a known chromosomal location, were 
selected for analysis In the TCC pair 827/532. Proteins were 
identified by a combination of methods (see "Experimental 
Procedures"). In general there was a highly significant corre- 
lation (p < 0.005) between mRNA and protein alterations (Rg. 
4). Only one gene showed disagreement between transcript 
alteration and protein alteration. Except for a group of cyto- 
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Fe. 5. Comparison of protein and transcript levels hi Invasive 
and non-invasive TCCs. The upper part of the figure shows a 2D gel 
(feft) and the oligonucleotide array (r/ghf) of TCC 532. The red rectan- 
gles on the upper gel highlight the areas that are compared below. 
Identical areas of 2D gels of TCCs 532 and 827 are shown below. 
Clearly, cytokeratins 13 and 15 are strongly down-regulated In TCC 
827 (red annotation). The tile on the array containing probes for 
cytokeratin 15 is enlarged below the array (red arrow) from TCC 532 
and is compared with TCC 827. The upper row of squares in each tile 
corresponds to perfect match probes; the tower row corresponds to 
mismatch probes containing a mutation (used for correction for un- 
specific binding). Absence of signal is depicted as black, and the 
higher the signal the lighter the color. A high transcript level was 
detected in TCC 532 (6151 units) whereas a much lower level was 
detected in TCC 827 (absence of signals). For cytokeratin 13, a high 
transcript level was also present in TCC 532 (15659 units), and a 
much lower level was present in TCC 827 (623 units). The 2D gels at 
the bottom of the figure (teff) show levels of PA-FABP and adipocyte- 
FABP In TCCs 335 and 733 (invasive), respectively. Both proteins are 
down-regulated in the invasive tumor. To the rig/it we show the array 
tiles for the PA-FABP transcript A medium transcript level was de- 
tected in the case of TCC 335 (1277 units) whereas very low levels 
were detected In TCC 733 (166 units). IEF. isoelectric focusing. 



keratins encoded by genes on chromosome 17 (Fig. 5) the 
analyzed proteins did not belong to a particular family. 26 well 
focused proteins whose genes had a know chromosomal 
location were detected in TCCs 733 and 335, and of tKese 19 
correlated (p < 0.005) with the mRNA changes detected using 
the arrays (Fig. 4). For example, PA-FABP was highly ex- 
pressed in the non-Invasive TCC 335 but tost In the invasive 
counterpart (TCC 733; see Fig. 5). The smaller number of 
proteins detected in both 733 and 335 was because of the 
smaller size of the biopsies that were available. 

11 chromosomal regions where CGH showed aberrations 
that corresponded to the changes In transcript levels also 
showed corresponding changes in the protein level (Table II). 
These regions included genes that encode proteins that are 
found to be frequently altered In bladder cancer, namely 
cytokeratins 17 and 20, annexins II and IV. and the fatty 
acid-binding proteins PA-FABP and FBP1. Four of these pro- 
teins were encoded by genes in chromosome 17q, a fre- 
quently amplified chromosomal area in invasive bladder 
cancers. 

DISCUSSION 

Most human cancers have abnormal DNA content, having 
lost some chromosomal parts and gained others. The present 
study provides some evidence as to the effect of these gains 
and losses on gene expression in two pairs of non-invasive 
and invasive TCCs using high throughput expression arrays 
and proteomics, in combination with CGH. In general, the 
results showed that there is a clear individual regulation of the 
mRNA expression of single genes, which in some cases was 
superimposed by a DNA copy number effect In most cases, 
genes located in chromosomal areas with gains often exhib- 
ited increased mRNA expression, whereas areas showing 
losses showed either no change or a reduced mRNA expres- 
sion. The latter might be because of the fact that losses most 
often are restricted to loss of one allele, and the cut-off point 
for detection of expression alterations was a 2-fold change, 
thus being at the border of detection. In several cases, how- 
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Table II 

Proteins whose exp ression level correlates with both mRNA and gene dose changes 

- — CGH alteration Transcript alteratiorr* Protein alteration 



Chromosomal location Tumor TCC 



Annexin II 
Annexin IV 
Cytokeratin 17 
Cytokeratin 20 
(PA-)FABP 
. FBP1 

Plasma gelsolin 
Heat shock protein 28 
Prohibitin 
Prolyl-4-hydroxyl 
hnRNPBI 



1q21 
2p13 

17q12-q21 

17q21.1 

8q21.2 

9q22 

9q31 

15q12-q13 
17q21 
17q25 
7p15 



733 


Gain 


Abs to Pres" 


Increase 


733 


Gain 


3.9-Fold up 


Increase 


827 


Gain 


3.8-Fold up 


Increase 


827 


Gain 


5.6-Fold up 


Increase 


827 


Loss 


10-Fold down 


Decrease 


827 


Gain 


2.3-Fold up 


Increase 


827 


Gain 


Abs to Pres 


Increase 


827 


Loss 


2.5-Fold up 


. Decrease 


827/733 


Gain 


3.7-/2-S-Fotd up 6 


Increase 


827/733 


Gain 


5.7-/1.6-FokJ up 


Increase 


827 


Loss 


2.5-Fold down 


Decrease 



-Tckse^^e^^ng alterations were found in both TCCs 827 and 733 these are shown as 827/733. 
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ever, an increase or decrease in DNA copy number was 
associated with de novo occurrence or complete loss of tran- 
script, respectively. Some of these transcripts could not be 
detected in the non-invasive tumor but were present at rela- 
tively high levels in areas with DNA amplifications In the inva- 
sive tumors (e.g. in TCC 733 transcript from cellular ligand of 
annexin II gene (chromosome 1q21) from absent to 2670 
arbitrary units; in TCC 827 transcript from small proline-rich 
protein 1 gene (chromosome 1q12-q21.1) from absent to 
1326 arbitrary units). It may be anticipated from these data 
that significant clustering of genes with an increased expres- 
sion to a certain chromosomal area Indicates an increased 
likelihood of gain of chromosomal material In this area- 
Considering the many possible regulatory mechanisms act- 
ing at the level of transcription, it seems striking that the gene 
dose effects were so clearly detectable in gained areas. One 
hypothetical explanation may lie in the loss of controlled 
( methylation in tumor cells (17-19). Thus, it may be possible 
' that in chromosomes with increased DNA copy numbers two 
or more alleles could be demethylated simultaneously leading 
to a higher transcription level, whereas in chromosomes with 
losses the remaining allele could be partly methylated, turning 
off the process (20, 21). A recent report has documented a 
ploidy regulation of gene expression in yeast, but in this case all 
the genes were present in the same ratio (22), a situation that is 
not analogous to that of cancer cells, which show marked 
chromosomal aberrations, as well as gene dosage effects. 

Several CGH studies of bladder cancer have shown that 
some chromosomal aberrations are common at certain 
stages of disease progression, often occurring in more than 1 
of 3 tumors. In pTa tumors, these include 9p-, 9q-, 1q+, Y- 
(2, 6), and in pT1 tumors, 2q-,11p-, 11q-. 1q+. 5p+, 8q+, 
17q+, and 20q+ (2-4, 6, 7). The pTa tumors studied here 
showed similar aberrations such as 9p- and 9q22-q33- and 
9q- and Y-, respectively. Likewise, the two minimal invasive 
pT1 tumors showed aberrations that are commonly seen at 
that stage, and TCC 827 had a remarkable resemblance to the 
commonly seen pattern of losses and gains, such as 1q22-24 
amplification (seen in both tumors), 11q14-q22 loss, the latter 
often linked to 17 q+ (both tumors), and 1q+ and 9p-, often 
linked to 20q+ and 11 q13+ (both tumors) (7-9). These ob- 
servations indicate that the pairs of tumors used in this study 
exhibit chromosomal changes observed in many tumors, and 
therefore the findings could be of general importance for 
bladder cancer. 

Considering that the mapping resolution of CGH is of about 
20 megabases it is only possible to get a crude picture of 
chromosomal instability using this technique. Occasionally, 
we observed reduced transcript levels close to or inside re- 
gions with increased copy numbers. Analysis of these regions 
by positioning heterozygous microsatellites as close as pos- 
sible to the locus showing reduced gene expression revealed 
loss of heterozygosity in several cases, it seems likely that 
multiple and different events occur along each chromosomal 



arm and that the use of cDNA microarrays for analysis of DNA 
copy number changes will reach a resolution that can resolve 
these changes, as has recently been proposed (2). The outlier 
data were not more frequent at the boundaries of the CGH 
aberrations. At present we do not know the mechanism be- 
hind chromosomal aneuploidy and cannot predict whether 
chromosomal gains will be transcribed to a larger extent than 
the two native alleles. A mechanism as genetic Imprinting has 
an impact on the expression level in normal cells and is often 
reduced in tumors. However, the relation between imprinting 
and gain of chromosomal material Is not known. 

We regard It as a strength of this investigation that we were 
able to compare invasive tumors to benign tumors rather than 
to normal urothelium, as the tumors studied were biologically 
very close and probably may represent successive steps in 
the progression of bladder cancer. Despite the limited amount 
of fresh tissue available it was possible to apply three different 
state of the art methods. The observed correlation between 
DNA copy number and mRNA expression is remarkable when 
one considers that different pieces of the tumor biopsies were 
used for the different sets of experiments. This indicate that 
bladder tumors are relatively homogenous, a notion recently 
supported by CGH and LOH data that showed a remarkable 
similarity even between tumors and distant metastasis (10. 23). 

In the few cases analyzed, mRNA and protein levels 
showed a striking correspondence although in some cases 
we found discrepancies that may be attributed to translational 
regulation, post-translational processing, protein degrada- 
tion, or a combination of these. Some transcripts belong to 
undertranslated mRNA pools, which are associated with few 
translationally inactive ribosomes; these pools, however, 
seem to be rare (24). Protein degradation, for example, may 
be very important in the case of polypeptides with a short 
half-life (e.g. signaling proteins). A poor correlation between 
mRNA and protein levels was found in liver cells as deter- 
mined by arrays and 2D-PAGE (25), and a moderate correla- 
tion was recently reported by Ideker et al. (26) in yeast. 
Mnterestingly, our study revealed a much better correlation 
between gained chromosomal areas and Increased mRNA 
levels than between loss of chromosomal areas and reduced 
mRNA levels. In general, the level of CGH change determined 
the ability to detect a change in transcript} One possible 
explanation could be that by losing one allele the change In 
mRNA level is not so dramatic as compared with gain of 
material, which can be rather unlimited and may lead to a 
severalfold increase in gene copy number resulting in a much 
higher impact on transcript level. The latter would be much 
easier to detect on the expression arrays as the cut-off point 
was placed at a 2-fold level so as not to be biased by noise on 
the array. Construction of arrays with a better signal to noise 
ratio may in the future allow detection of lesser than 2-fold 
alterations in transcript levels, a feature that may facilitate the 
analysis of the effect of loss of chromosomal areas on tran- 
script .levels. 
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In eleven cases we found a significant correlation between 
DNA copy number, mRNA expression, and protein level. Four 
of these proteins were encoded by genes located at a fre- 
quently amplified area in chromosome 17q. Whether DNA 
copy number is one of the mechanisms behind alteration of 
these eleven proteins is at present unknown and will have to 
be proved by other methods using a larger number of sam- 
ples. One factor making such studies complicated is the large 
extent of protein modification that occurs after translation, 
requiring immunoidentificatipn and/or mass spectrometry to 
correctly identify the proteins in the gels. 

In conclusion, the results presented in this study exemplify 
the large body of knowledge that may be possible to gather in 
the future by combining state of the art techniques that follow 
the pathway from DNA to protein (26). Here, we used a tradi- 
tional chromosomal CGH method, but In the future high reso- 
lution CGH based on microarrays with many thousand radiation 
hybrid-mapped genes will Increase the resolution and informa- 
tion derived from these types of experiments (2). Combined with 
expression arrays analyzing transcripts derived from genes with 
known locations, and 2D gel analysis to obtain information at 
the post-translational level, a clearer and more developed un- 
derstanding of the tumor genome will be forthcoming. 
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ABSTRACT 

Genetic changes underlie tumor progression and may lead to cancer- 
specific expression of critical genes. Over 1100 publications have de- 
scribed the use of comparative genomic hybridization (CGH) to analyze 
the pattern of copy number alterations in cancer, but very few of the genes 
affected are known. Here, we performed high-resolution CGH analysis on 
cDNA microarrays in breast cancer and directly compared copy number ^ 
and mRNA expression levels of 13,824 genes to qnantitate the Impact of 
genomic changes on gene expression. We identified and mapped the 
boundaries of 24 independent amplicons, ranging in size from 0.2 to 12 
Mb. Throughout the genome, both high- and low-level copy number 
changes had a substantial Impact on gene expression, with 44% of the 
highly amplified genes showing overexpression and 10.5% of the highly 
overexpressed genes being amplified. Statistical analysis with random 
permutation teste identified 270 genes whose expression levels across 14 
samples were systematically attributable to gene amplification. These 
included roost previously described amplified genes in breast cancer and 
many novel targets for genomic alterations, including the HOXB7 gene, 
the presence of which in a novel amplicon at 17q21 3 was validated In 
10 .2% of primary breast cancers and associated with poor patient prog- 
nosis. In conclusion, CGH on cDNA microarrays revealed hundreds of 
novel genes whose overexpression Is attributable to gene amplification. 
These genes may provide Insights to the clonal evolution and progression 
of breast cancer and highlight promising therapeutic targets. 

INTRODUCTION 

Gene expression patterns revealed by cDNA microarrays have 
facilitated classification of cancers into biologically distinct catego- 
ries, some of which may explain the clinical behavior of the tumors 
(1-6). Despite this progress in diagnostic classification, the molecular 
mechanisms underlying gene expression patterns in cancer have re- 
mained elusive, and the utility of gene expression profiling m the 
identification of specific therapeutic targets remains limited. 

Accumulation of genetic defects is thought to underlie the clonal 
evolution of cancer. Identification of the genes that mediate the effects 
of genetic changes may be important by highlighting transcripts that 
are actively involved in tumor progression. Such transcripts and their 
encoded proteins would be ideal targets for anticancer therapies, as 
demonstrated by the clinical success of new therapies against ampli- 
fied oncogenes, such as ERBB2 and EGFR (7, 8), in breast cancer and 
other solid tumors. Besides amplifications of known oncogenes, over 
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Copy number ratio 




Expression ratio 



Fig 1 Impact or gene copy number on global gene expression levels. A. percentage or 
over- and undcrexprcssed genes (Y axis) according to copy number ratios 
Threshold values used for over- and undcrexpresaion were >2.18* (global upper 7% o) 
the cDNA ratios) and <0.4826 (global lower 7% or the expression ratios). B. percentage 
of amplified and deleted genes according to expression ratios. Threshold values for 
amplification and deletion were > 1. 5 and <0.7. 



20 recurrent regions of DNA amplification have been mapped in 
breast cancer by CGH 5 (9, 10). However, these amplicons are often 
large and poorly defined, and their impact on gene expression remains 
unknown. 

We hypothesized that genome-wide identification of those gene 
expression changes that are attributable to underlying gene copy 
number alterations would highlight transcripts that are actively in- 
volved m the causation or maintenance of the malignant phenotype. 
To identify such transcripts, we applied a combination of cDNA and 
CGH microarrays to: (a) determine the global impact that gene copy 
number variation plays in breast cancer development and progression; 
and (b) identify and characterize those genes whose mRNA expres- 



3 The abbreviations used are: CGH, comparative genomic hybridization; FISH, fluo- 
rescence in situ hybridization; RT-PCR, reverse transcription-PCR. 
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Fig . 2. Genome-wide copy number and expression in t>* MCF-7 ^S^^ISS.^ 
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indicated with a dashed lint. 



sion is most significantly associated with amplification of the corre- 
sponding genomic template. 

MATERIALS AND METHODS 

Breast Cancer Cell Lines. Fourteen breast cancer cell lines (BT-20, BT- 
474, HCC1428, Hs578t, MCF7, MDA-361, MDA-436, MDA-453, MDA^»68, 
SKBR-3, T-47D, UACC812. ZR-75-1, and ZR-75-30) were obtained from the 
American Type Culture Collection (Manassas, VA). Cells were grown under 
recommended culture conditions. Genomic DNA and roRNA were isolated 
using standard protocols. 

Copy Number and Expression Analyses by cDNA Microarrays. The 
preparation and printing of the 13,824 cDNA clones on glass slides were 
performed as described (1 1-1 3). Of these clones, 244 represented uncharac- 
terized expressed sequence tags, and the remainder corresponded to known 
genes. CGH experiments on cDNA microarrays were done as described (14, 
15). Briefly, 20 fig of genomic DNA from breast cancer cell lines and normal 
human WBCs were digested for 14-18 h with Ahil and Ami (Life Technol- 
ogies, Inc., Rockville, MD) and purified by phenoUchloroform extraction. Six 
Mg of digested cell line DNAs were labeled with Cy3-dUTP (Amersham 
Pharmacia) and normal DNA with Cy5-dUTP (Amersham Pharmacia) using 
the Bioprime Labeling kit (Life Technologies, Inc.). Hybridization (14, 15) and 
posthybridization washes (13) were done as described. For the expression 
analyses, a standard reference (Universal Human Reference RNA; Stratagene, 
La Jolla, CA) was used in all experiments. Forty ug of reference RNA were 
labeled with Cy3-dUTP and 3.5 ng of test mRNA with Cy5-dUTP, and the 
labeled cDNAs were hybridized on microarrays as described (1 3, 1 5). For both 
microarray analyses, a laser confocal scanner (Agilent Technologies, Palo 
Alto, CA) was used to measure the fluorescence intensities at the target 
locations using the DEARRAY software (16). After background subtraction, 
average intensities at each clone in the test hybridization were divided by the 
average intensity of the corresponding clone in the control hybridization. For 
the copy number analysis, the ratios were normalized on the basis of the 
distribution of ratios of all targets on the array and for the expression analysis 
on the basis of 88 housekeeping genes, which were sponed four times onto the 
array. Low quality measurements (I.e., copy number data with mean reference 
intensity <100 fluorescent units, and expression data with both test and 
reference intensity <100 fluorescent units and/or with spot size <50 units) 



were excluded from the analysis and were treated as missing values. The 
distributions of fluorescence ratios were used to define cutpoints for increased/ 
decreased copy number. Genes with CGH ratio >1.43 (representing the upper 
5% of the CGH ratios across all experiments) were considered to be amplified, 
and genes with ratio <0.73 (representing the lower 5%) were considered to be 
deleted. 

Statistical Analysis of CGH and cDNA Microarray Data. To evaluate 
the influence of copy number alterations on gene expression, we applied die 
following statistical approach. CGH and cDNA calibrated intensity ratios were 
log-transformed and normalized using median centering of the values in each 
cell line. Furthermore, cDNA ratios for each gene across all 14 cell lines were 
median centered. For each gene, the CGH data were represented by a vector 
that was labeled 1 for amplification (ratio, > 1.43) and 0 for no amphficanon. 
Amplification was correlated with gene expression using the signal-to-noise 
statistics (1). We calculated a weight, >v for each gene as follows: 



- nigo 
+ trj; 1 



where m„„ o-„ and oy, denote the means and SDs for the expression 
levels for amplified and nonamplified cell lines, respectively. To assess the 
statistical significance of each weight, we performed 10,000 random permu- 
tations of the label vector. The probability that a gene had a larger or equal 
weight by random permutation than the original weight was denoted by a. A 
low a (<0.05) indicates a strong association between gene expression and 
amplification. 

Genomic Localization of cDNA Clones and AmpKcon Mapping. Each 
cDNA clone on the microarray was assigned to a Unigene cluster using the 
Unigene Build 141* A database of genomic sequence alignment information 
for mRNA sequences was created from the August 2001 freeze of the Uni- 
versity of California Santa Cruz's GoldenPath database. 7 The chromosome and 
bp positions for each cDNA clone were then retrieved by relating these data 
sets Amplicons were defined as a CGH copy number ratio >2.0 in at least two 
adjacent clones in two or more cell lines or a CGH ratio >2.0 in at least three 
adjacent clones in a single cell line. The amplicon start and end positions were 



• Internet address: bttp://research.nhgTUa.gp^ 
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Tabic I Summary cf Independent amplicons 

CGH microarray 



14 breast cancer cell lines by 



Location 



lpl3 
Iq2l 
lq22 
3pl4 

7pl2.1-7pll.2 

7q3l 

7q32 

8q21.ll-8q2t.13 
8q2IJ 

8q23.3-Sq24.l4 

8q24.22 

9pl3 

13q22-q31 

16q22 

17qll 

I7ql2-q21.2 

l7q2I.32-q2t.J3 

I7q22-q23.3 

17q23J-q24.3 

19ql3 

20qll.22 

20ql3.12 

20ql3.12-ql3.13 

20ql3.2-ql3.32 



Start (Mb) 


End (Mb) 


Size (Mb) 


132.79 


„,„. 

132.94 . 


0 2 


173.92 


177.25 


33 


179.28 


179.57 


0.3 ■ 


71.94 


74.00 


2.7 


55.62 


60.95 


5.3 


125.73 


130.96 


J. A 


140.01 


140.68 


0.7 


86.45 


92.46 


6.0 


98.45 


103.05 




129.88 


142.15 




151.21 


152.16 


1 A 
1.0 


38.65 


39.25 


n A 
u.o . 


77.15 


8138 


4.2 


86.70 


87.62 


0.9 


29.30 


30.85 


1.6 


39.79 


42.80 


3.0 


52.47 


55.80 


3J 


63.81 


69.70 


5.9 


69.93 


74.99 


5.1 


40.63 


41.40 


0.8 


34.59 


35.85 


1J 


44.00 


45.62 


1.6 


46.45 


49.43 


3.0 


5U2 


59;t2 


7.8 



GENE EXPRESSION PATTERNS IN BREAST CANCER 

CGH were validated, with lq2l, 17ql2-q21^, 17q22-q23, 20ql3.1, 
and 20ql3.2 regions being most commonly amplified. Furthermore, 
the boundaries of these amplicons were precisely delineated. In ad- 
dition, novel amplicons were identified at 9pl3 (38.65-39.25 Mb), 
and 17q213 (52.47-55.80 Mb). 

Direct Identification of Putative Amplification Target Genes. 
The cDNA/CGH microarray technique enables the direct correla- 
tion of copy number and expression data on a gene-by-gene basis 
throughout the genome. We directly annotated high-resolution 
CGH plots with gene expression data using color coding. Fig. 2C 
shows that most of the amplified genes in the MCF-7 breast cancer 
cell line at lpl3, 17q22-q23, and 20ql3 were highly overex- 
pressed. A view of chromosome 7 in the MDA-468 cell line 
implicates EGFR as the most highly overexpressed and amplified 
gene at 7pl l-pl2 (Fig. 3/1). In BT-474, the two known amplicons 
at 17ql2 and 17q22-q23 contained numerous highly overex- 
pressed genes (Fig. 3fl). In addition, several genes, including the 
homeobox genes HOXB2*nd HOXB7, were highly amplified in a 
previously undescribed independent amplicon at 17q21.3. HOXB7 
was systematically amplified (as validated by FISH, Fig. IB, inset) 
as well as overexpressed (as verified by RT-PCR, date not shown) 
in BT-474, UACC812, and ZR-75-30 cells. Furthermore, this novel 



extended to include neighboring nonainplified clones (ratio, <t.5), The am- 
plicon size determination was partially dependent on local clone density. 

FISH. Dual-color interphase FISH to breast cancer cell line* was done as 
described (17). Bacterial artificial chromosome clone RP11-361K8 was la- 
beled with SpectrumOrange (Vysis, Downers Grove, IL), and Spectrum- 
Orange-labeled probe for EGFR was obtained from Vysis. SpectturnGreen- 
labeled chromosome 7 and 17 centromere probes (Vysis) were used as a 
reference A tissue microarray containing 612 formalin-fixed, paraffin-embed- 
ded primary breast cancers (17) was applied in FISH analyses as described 
(18). The use of these specimens was approved by the Ethics Committee of the 
University of Basel and by the NIH. Specimens containing a 2-fold or higher 
increase in ihe number of test probe signals, as compared with corresponding 
centromere signals, in at least 10% of the tumor cells were considered to be 
amplified. Survival analysis was performed using the Kaplan-Meier method 
and the log-rank test. . 

RT-PCR. The HOXB7 expression level was determined relative to 
CAPDH. Reverse transcription and PCR amplification were performed using 
Access RT-PCR System (Promega Corp.. Madison, Wl) with 10 ng of mRNA 
as a template. HOXB7 primers were 5'-GAGCAGAGGGACTCGGACTT-3 
and 5'-GCGTCAGGTAGCGATTGTAG-3'. 



RESULTS 

Global Effect of Copy Number on Gene Expression. 13,824 
arrayed cDNA clones were applied for analysis of gene expression 
and gene copy number (CGH microarrays) in 14 breast cancer cell 
lines. The results illustrate a considerable influence of copy number 
on gene expression patterns. Up to 44% of the highly amplified 
transcripts (CGH ratio, >2.5) were overexpressed (/.«?.. belonged to 
the global upper 7% of expression ratios), compared with only 6% for 
genes with normal copy number levels (Fig. 1A). Conversely, 10.5% 
of the transcripts with high-level expression (cDNA ratio, >10) 
showed increased copy number (Fig. IB). Low-level copy number 
increases and decreases were also associated with similar, although 
less dramatic, outcomes on gene expression (Fig. 1). 

Identification of Distinct Breast Cancer Amplicons. Base-pair 
locations obtained for 1 1 ,994 cDNAs (86.8%) were used to plot copy 
number changes as a function of genomic position (Fig. 2, Supple- 
ment Fig. A). The average spacing of clones throughout the genome 
was 267 kb. This high-resolution mapping identified 24 independent 
breast cancer amplicons, spanning from 0.2 to 12 Mb of DNA (Table 
1). Several amplification sites detected previously by chromosomal 
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GENE EXPRESSION PATTERNS IN BREAST CANCER 



Fig. A. List of 50 genes with a statistically 
significant correlation (a value <0.05) between 
gene copy number and gene expression. Name, 
chromosomal location, and the a value for each 
gene are indicated. The genes have been ordered 
according to their position in the genome The color 
maps on the right illustrate the copy number and 
expression ratio patterns in the 14 cell lines. The 
key to the color code is shown at the bottom of the 
graph. Gray squares, missing values. The complete 
list of 270 genes is shown in supplemental Fig. B. 
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amplification was validated to be present in 10.2% of 363 primary 
breast cancers by FISH to a tissue microarray and was associated 
with poor prognosis of the patients (P = 0.001). 

Statistical Identification and Characterization or 270 Highly 
Expressed Genes in Amplicons. Statistical comparison of expres- 
sion levels of all genes as a function of gene amplification identified 
270 genes whose expression was significantly influenced by copy 
number across all 14 cell lines (Fig. 4, Supplemental Fig. B). Accord- 
ing to the gene ontology data,* 91 of the 270 genes represented 
hypothetical proteins or genes with no functional annotation, whereas 
179 had associated functional information available. Of these, 151 
(84%) are implicated in apoptosis, cell proliferation, signal transduc- 
tion, and transcription, whereas 28 (16%) had functional annotations 
that could not be . directly linked with cancer. 



DISCUSSION 

The importance of recurrent gene and chromosome copy number 
changes in the development and progression of solid tumors has been 
characterized in >1000 publications applying CGH* (9, 10), as well 
as in a large number of other molecular cytogenetic, cytogenetic, and 
molecular genetic studies. The effects of these somatic genetic 
changes on gene expression levels have remained largely unknown, 
although a few studies have explored gene expression changes occur- 
ring in specific amplicons (15, 19-21). Here, we applied genome- 
wide cDNA micro arrays to identity transcripts whose expression 
changes were attributable to underlying gene copy number alterations 
in breast cancer. 

The overall impact of copy number on gene expression patterns was 
substantial with the most dramatic effects seen in the case of high- 



lntcrnct address: http://www.gcneonlology.org/. 



* Interact address: httpy/www-DCbLnlm.nih-gov/entra. 
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level copy number increase. Low-level copy number gains and losses 
also had a significant influence on expression levels of genes in the 
regions affected, but these effects were more subtle on a gene-by-gene 
basis than those of high-level amplifications. However, the impact of 
low-level gains on the dysregulation of gene expression patterns in 
cancer may be equally important if not more important than that of 
high-level amplifications. Aneuploidy and low-level gains and losses 
of chromosomal arms represent the most common types of genetic 
alterations in breast and other cancers and, therefore, have an influ- 
ence on many genes. Our results in breast cancer extend the recent 
studies on the impact of aneuploidy on global gene expression pat- 
terns in yeast cells, acute myeloid leukemia, and a prostate cancer 
model system (22-24). 

The CGH microarray analysis identified 24 independent breast 
cancer amplicons. We defined the precise boundaries for many am- 
plicons detected previously by chromosomal CGH (9, 10, 25, 26) and 
also discovered novel amplicons that had not been detected previ- 
ously, presumably because of their small size (only 1-2 Mb) or close 
proximity to other larger amplicons. One of these novel amplicons 
involved the homeobox gene region at 17q2l.3 and led to the over- 
expression of the HOXB7 and HOXB2 genes. The homeodomain 
transcription factors are known to be key regulators of embryonic 
development and have been occasionally reported to undergo aberrant 
expression in cancer (27, 28). HOXB7 transfection induced cell pro- 
liferation in melanoma, breast, and ovarian cancer cells and increased 
tumorigenicity and angiogenesis in breast cancer (29-32). The pres- 
ent results imply that gene amplification may be a prominent mech- 
anism for oyerexpressing HOXB7 in breast cancer and suggest that 
HOXB7 contributes to tumor progression and confers an aggressive 
disease phenotype in breast cancer. This view is supported by our 
finding of amplification of HOXB7 in 10% of 363 primary breast 
cancers, as well as an association of amplification with poor prognosis 
of the patients. 

We carried out a systematic search to identify genes whose 
expression levels across all 14 cell lines were attributable to 
amplification status. Statistical analysis revealed 270 such genes 
(representing -2% of all genes on the array), including not onjy 
previously described amplified genes, such as HER-2, MYC, 
EGFR, ribosomal protein s6 kinase^ and AJB3, but also numerous 
novel genes such as NRAS-related gene (lpl3), syndecan-2 (8q22), 
and bone morphogenic protein (20ql3.1), whose activation by 
amplification may similarly promote breast cancer progression. 
Most of the 270 genes have not been implicated previously in 
breast cancer development and suggest novel pathogenetic mech- 
anisms. Although we would not expect all of them to be causally 
involved, it is intriguing that 84% of the genes with associated 
functional information were implicated in apoptosis, cell prolifer- 
ation, signal transduction, transcription, or other cellular processes 
that could directly imply a possible role in cancer progression. 
Therefore, a detailed characterization of these genes may provide 
biological insights to breast cancer progression and might lead to 
the development of novel therapeutic strategies. 

In summary, we. demonstrate application of cDNA microarrays 
to the analysis of both copy number and expression levels of over 
12,000 transcripts throughout the breast cancer genome, roughly 
once every 267 kb. This analysis provided: (a) evidence of a 
■ prominent global influence of copy number changes on gene 
expression levels; (b) a high-resolution map of 24 independent 
amplicons in breast cancer, and (c) identification of a set of 270 
genes, the overexpression of which was statistically attributable to 
gene amplification. Characterization of a novel amplicon at 
. 17q2L3 implicated amplification and overexpression of the 
HOXB7 gene in breast cancer, including a clinical association 
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between HOXB7 amplification and poor patient prognosis. Overall, 
our results illustrate how the identification of genes activated by 
gene amplification provides a powerful approach to highlight 
genes with an important role in cancer as well as to prioritize and 
validate putative targets for therapy development. 
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Genomic DNA copy number alterations are key genetic events in 
the development and progression of human cancers. Here we 
report a genome-wide microarray comparative genomic hybrid- 
ization (array C6H) analysis of DNA copy number variation in 
a series of primary human breast tumors. We have profiled DNA 
copy number alteration across 6.691 mapped human genes, in 44 
predominantly advanced, primary breast tumors and 10 breast 
cancer cell lines. While the overall patterns of DNA amplification 
and deletion corroborate previous cytogenetic studies, the high- 
resolution (gene-by-gene) mapping of amplicon boundaries and 
the quantitative analysis of amplicon shape provide significant 
improvement in the localization of candidate oncogenes. Parallel 
microarray measurements of mRNA levels reveal the remarkable 
degree to which variation in gene copy number contributes to 
variation in gene expression in tumor cells. Specifically, we find 
that 62% of highly amplified genes show moderately or highly 
elevated expression, that DNA copy number influences gene ex- 
pression across a wide range of DNA copy number alterations 
(deletion, low-, mid- and high-level amplification), that on average, 
a 2-fold change in DNA copy number Is associated with a corre- 
sponding 1.5-fold change in mRNA levels, and that overall, at least 
12% of all the variation in gene expression among the breast 
tumors Is directly attributable to underlying variation In gene copy 
number. These findings provide evidence that widespread DNA 
copy number alteration can lead directly to global deregulation of 
gene expression, which may contribute to the development or 
progression of cancer. 

Conventional cytogenetic techniques, including comparative 
genomic hybridization (CGH) (1), have led to the identifi- 
cation of a number of recurrent regions of DNA copy number 
alteration in breast cancer cell lines and tumors (2-4). While 
some of these regions contain known or candidate oncogenes 
le*. FGFR1 (8pll), MYC (8q24), CCND1 (llql3), ERBB2 
(17ql2), and ZNF237 (20ql3)] and tumor suppressor genes 
[RBI (13ql4) and TP53 (I7pl3)], the relevant gene(s) within 
other regions (e.g., gain of lq, 8q22, and 17q22-24, and loss of 
8p) remain to be identified. A high-resolution genome-wide 
map, delineating the boundaries of DNA copy number alter- 
ations in tumors, should facilitate the localization and identifi- 
cation of oncogenes and tumor suppressor genes in breast 
cancer. In this study, we have created such a map, using 
array-based CGH (5-7) to profile DNA copy number alteration 
in a series of breast cancer cell lines and primary tumors. 

An unresolved question is the extent to which the widespread 
DNA copy number changes that we and others have identified 
in breast tumors alter expression of genes within involved 
regions. Because we had measured mRNA levels in parallel in 
the same samples (8), using the same DNA microarrays. we had 
an opportunity to explore on a genomic scale the relationship 
between DNA copy number changes and gene expression. From 



this analysis, we have identified a significant impact of wide- 
spread DNA copy number alteration on the transcriptional 
programs of breast tumors. 

Materials and Methods 

Tumors and Cell Lines. Primary breast tumors were predominantly 
large (>3 cm), intermediate-grade, infiltrating ductal carcino- 
mas, with more than 50% being lymph node positive. The 
fraction of tumor cells within specimens averaged at least 50%. 
Details of individual tumors have been published (8, 9), and 
are summarized in Table 1, which is published as supporting 
information on the PNAS web site, www.pnas.org. Breast cancer 
cell lines were obtained from the American Type Culture 
Collection. Genomic DNA was isolated either using Qiagen 
genomic DNA columns, or by phenol/chloroform extraction 
followed by ethanol precipitation. 

DNA Labeling and Microarray Hybridizations. Genomic DNA label- 
ing and hybridizations were performed essentially as described 
in Pollack et al (7), with slight modifications. Two micrograms 
of DNA was labeled in a total volume of 50 microliters and the 
volumes of all reagents were adjusted accordingly. 'Test" DNA 
(from tumors and cell lines) was fluorescently labeled (Cy5) and 
hybridized to a human cDNA microarray containing 6,691 
different mapped human genes (i.e., UniGene clusters). The 
"reference" (labeled with Cy3) for each hybridization was nor- 
mal female leukocyte DNA from a single donor. The fabrication 
of cDNA microarrays and the labeling and hybridization of 
mRNA samples have been described (8). 

Data Analysis and Map Positions. Hybridized arrays were scanned 
on a GenePix scanner (Axon Instruments, Foster City, CA), and 
fluorescence ratios (test/reference) calculated using scanalyze 
software (available at http://rana.Ibl.gov). Fluorescence ratios 
were normalized for each array by setting the average log 
fluorescence ratio for all array elements equal to 0. Measure- 
ments with fluorescence intensities more than 20% above back- 
ground were considered reliable. DNA copy number profiles 
that deviated significantly from background ratios measured in 
normal genomic DNA control hybridizations were interpreted as 
evidence of real DNA copy number alteration (see Estimating 
Significance of Altered Fluorescence Ratios in the supporting 
information). When indicated, DNA copy number profiles are 
displayed as a moving average (symmetric 5-nearest neighbors). 
Map positions for arrayed human cDNAs were assigned by 



Abbreviation: CGH. comparative genomic hybridization. 
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identifying the starting position of the best and longest match of 
any DNA sequence represented in the corresponding UmGene 
cluster (10) against the "Golden Path" genome assembly 
(http://genorae.ucsc.edu/; Oct 7, 2000 Freeze). For UmGene 
clusters represented by multiple arrayed elements, mean fluo- 
rescence ratios (for all elements representing the same UmGene 
cluster) are reported. For mRNA measurements, fluorescence 
ratios are "mean-centered" (i.e., reported relative to the mean 
ratio across the 44 tumor samples). The data set described here 
can be accessed in its entirety in the supporting information. 

Results 

We performed CGH on 44 predominantly locally advanced, 
primary breast tumors and. 10 breast cancer cell lines, using 
cDNA microarrays containing 6,691 different mapped human 
genes (Fig. la; also see Materials and Methods for details of 
microarray hybridizations). To take full advantage of the im- 
proved spatial resolution of array CGH, we ordered (fluores- 
cence ratios for) the 6,691 cDNAs according to the "Golden 
Path" (http://genome.ucsc.edu/) genome assembly of the draft 
human genome sequences (11). In so doing, arrayed cDNAs not 
only themselves represent genes of potential interest (e.g., 
candidate oncogenes within amplicons), but also provide precise 
genetic landmarks for chromosomal regions of amplification and 

11964 | www.pfwt.or9/cgl/dol/10.1073/pnas.162471999 



deletion. Parallel analysis of DNA from cell lines containing 
different numbers of X chromosomes (Fig. 16), as we did before 
(7) demonstrated the sensitivity of our method to detect single- 
copy loss ( 45 > x0 >- *** ^ C^XX*)' 2- (48.XXXX) or 
Z5-fold (49.XXXXX) gains (also see Fig. 5, which is published 
as supporting information on the PNAS web site). Fluorescence 
ratios were linearly proportional to copy number ratios, which 
were slightly underestimated, in agreement with previous ob- 
servations (7). Numerous DNA copy number alterations were 
evident in both the breast cancer cell lines and primary tumors 
(Fig. 1«), detected in the tumors despite the presence of euploid 
non-tumor cell types; the magnitudes of the observed changes 
were generally tower in the tumor samples. DNA copy-number 
alterations were found in every cancer cell line and tumor, and 
on every human chromosome in at least one sample. Recurrent 
regions of DNA copy number gain and loss were readily iden- 
tifiable. For example, gains within lq, 8q, 17q, and 20q were 
observed in a high proportion of breast cancer cell lines/tumors 
(90%/69%, 100%/47%, 100%/60%, and 90%/44%, respective- 
ly), as were losses within lp, 3p, 8p, and 13q (80%/24%, 
80%/22%, 80%/22%, and 70%/18%, respectively), consistent 
with published cytogenetic studies (refs. 2-4; a complete listing 
of gains/losses is provided in Tables 2 and 3, which are published 
as supporting information on the PNAS web site). The total 
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Fig.2. DNAco P ynumber 8 teration»aoBchromosome8bya TO yCGH.(a)DNA copy number profiles are "^'^'^J^^"^^^^" 
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number of genomic alterations (gains and losses) was found to 
be significantly higher in breast tumors that were high grade {P = 
0.008), consistent with published CGH data (3), estrogen recep- 
tor negative (P <= 0.04), and harboring TP53 mutations (P = 
0.0006) (see Table 4, which is published as supporting informa- 
tion on the PNAS web site). 

The improved spatial resolution of our array CGH analysis is 
illustrated for chromosome 8, which displayed extensive DNA 
copy number alteration in our series. A detailed view of the 
variation in the copy number of 241 genes mapping to chromo- 
some 8 revealed multiple regions of recurrent amplification; 
each of these potentially harbors a different known or previously 
unchaxacterized oncogene (Fig. la). The complexity of amplicon 
structure is most easUy appreciated in the breast cancer cell line 
SKBR3. Although a conventional CGH analysis of 8q in SKBR3 
identified only two distinct regions of amplification (12), we 
observed three distinct regions of high-level amplification (la- 
beled 1-3 in Fig; lb). For each of these regions we can define the 



boundaries of the interval recurrently amplified fa the tumors we 
examined; in each case, known or plausible candidate oncogenes 
can be identified (a description of these regions, as well as the 
recurrently amplified regions on chromosomes 17 and 20, can be 
found in Figs. 6 and 7, which are published as supporting 
information on the PNAS web site). 

For a subset of breast cancer cell lines and tumors (4 and 37, 
respectively), and a subset of arrayed genes (6,095), mRNA 
levels were quantitatively measured in parallel by using cDNA 
microarrays (8). The parallel assessment of mRNA levels is 
useful in the interpretation of DNA copy number changes. For 
example, the highly amplified genes that are also highly ex- 
pressed are the strongest candidate oncogenes within an ampli- 
con. Perhaps more significantly, our parallel analysis of DNA 
copy number changes and mRNA levels provides us the oppor- 
tunity to assess the global impact of widespread DNA copy 
number alteration on gene expression in tumor cells. 

A strong influence of DNA copy number on gene expression 
is evident in an examination of the pseudocolor representations 
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of DNA copy number and mRNA levels for genes on chromo- 
some 17 (Fig. 3). The overall patterns of gene amplification and 
elevated gene expression are quite concordant; i.c a significant 
fraction of highly amplified genes appear to be correspondingly 
highly expressed. The concordance between high-level amplifi- 
cation and increased gene expression is not restricted to chro- 
mosome 17. Genome-wide, of 117 high-level DNA amplifica- 
tions (fluorescence ratios >4, and representing 91 different 
genes), 62% (representing 54 different genes; see Table 5, which 
is published as supporting information on the PNAS web site) 
are found associated with at least moderately elevated mRNA 
levels (mean-centered fluorescence ratios >2), and 42% (rep- 
resenting 36 different genes) are found associated with compa- 
rably highly elevated mRNA levels (mean-centered fluorescence 
ratios >4). 

To determine the extent to which DNA deletion and lower- 
level amplification (in addition to high-level amplification) are 
also associated with corresponding alterations in mRNA levels, 
we performed three separate analyses on the complete data set 
(4 cell lines and 37 tumors, across 6,095 genes). First, we 
determined the average mRNA levels for each of five classes 
of genes, representing DNA deletion, no change, and low-, 
medium-, and high-level amplification (Fig. 4o). For both the 



breast cancer cell lines and tumors, average mRNA levels 
tracked with DNA copy number across all five classes, in a 
statistically significant fashion (P values for pair-wise Student's 
1 tests comparing adjacent classes: cell lines, 4 x 10"*', 1 x 10 , 
5 x 10- J . 1 X 10-*; tumors, 1 x 10" 43 , 1 X 10"" 4 , 5 x 10" 41 , 
1 X 10"*). A linear regression of the average log(DNA copy 
number), for each class, against average log(mRNA level) 
demonstrated that on average, a 2-fold change in DNA copy 
number was accompanied by 1.4- and 1.5-fold changes in mRNA • 
level for the breast cancer cell lines and tumors, respectively (Fig. 
4a, regression line not shown). Second, we characterized the 
distribution of the 6,095 correlations between DNA copy num- 
ber and mRNA level, each across the 37 tumor samples (Fig. 46). 
The distribution of correlations forms a normal-shaped curve, 
but with the peak markedly shifted in the positive direction from 
zero. This shift is statistically significant, as evidenced in a plot 
of observed vs. expected correlations (Fig. 4c), and reflects a 
pervasive global influence of DNA copy number alterations on 
gene expression. Notably, the highest correlations between DNA 
copy number and mRNA level (the right tail of the distribution 
in Fig. 46) comprise both amplified and deleted genes (data not 
shown). Third, we used a linear regression model to estimate the 
fraction of all variation measured in mRNA levels among the 37 
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tumors that could be attributed to underlying variation in DNA 
copy number. From this analysis, we estimate that, overaJl, about 
7% of all of the observed variation in mRNA levels can be 
explained directly by variation in copy number of the altered 
genes (Fig. 4d). We can reduce the effects of experimental 
measurement error on this estimate by using only that fraction 
of the data most reliably measured (fluorescence intensity/ 
background >3); using that data, our estimate of the percent 
variation in mRNA levels directly attributed to variation in gene 
copy number increases to 12% (Fig. 4rf). This still undoubtedly 
represents a significant underestimate, as the observed variation 
in global gene expression is affected not only by true variation in 
the expression programs of the tumor cells themselves, but also 
by the variable presence of non-tumor cell types within clinical 
samples. 

Discussion 

This genome-wide, array CGH analysis of DNA copy number 
alteration in a series of human breast tumors demonstrates the 
usefulness of defining amplicon boundaries at high resolution 
(gene-by-gene), and quantitatively measuring amplicon shape, to 
assist in locating and identifying candidate oncogenes. By ana- 
lyzing mRNA levels in parallel, we have also discovered that 
changes in DNA copy number have a large, pervasive, direct 
effect on global gene expression patterns in both breast cancer 



cell lines and tumors. Although the DNA microarrays used in our 
analysis may display a bias toward characterized and/or highly 
expressed genes, because we are examining such a large fraction 
of the genome (approximately 20% of all human genes), and 
because, as detailed above, we are likely underestimating the 
contribution of DNA copy number changes to altered gene 
expression, we believe our findings are likely to be generalizable 
(but would nevertheless still be remarkable if only applicable to 
this set of -6.100 genes). 

In budding yeast, aneuploidy has been shown to result in 
chromosome-wide gene expression biases (13). Two recent 
studies have begun to examine the global relationship between 
DNA copy number and gene expression in cancer cells. In 
agreement with our findings, Phillips el aL (14) have shown that 
with the acquisition of tumorigenicity in an immortalized pros- 
tate epithelial cell line, new chromosomal gains and losses 
resulted in a statistically significant respective increase and 
decrease in the average expression level of involved genes. In 
contrast, Platzcr et at. (15) recently reported that in metastatic 
colon tumors only -4% of genes within amplified regions were 
found more highly (>2-fold) expressed, when compared with 
normal colonic epithelium. This report differs substantially from 
our finding that 62% of highly amplified genes in breast cancer 
exhibit at least 2-fold increased expression. These contrasting 
findings may reflect methodological differences between the 
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studies. For example, the study of Platzer et al. (15) may have 
systematically under-measured gene expression changes. In this 
regard it is remarkable that only 14 transcripts of many thousand 
residing within unamplified chromosomal regions were found to 
exhibit at least 4-fold altered expression in metastatic colon 
cancer. Additionally, their reliance on lower-resolution chromo- 
somal CGH may have resulted in poorly delimiting the bound- 
aries of high-complexity amplicons, effectively overcalling re- 
gions with amplification. Alternatively, the contrasting findings 
for amplified genes may represent real biological differences 
between breast and metastatic colon tumors; resolution of this 
issue will require further studies. 

Our finding that widespread DNA copy number alteration has 
a large, pervasive and direct effect on global gene expression 
patterns in breast cancer has several important implications. 
First, this finding supports a high degree of copy number- 
dependent gene expression in tumors. Second, it suggests that 
most genes are not subject to specific autoregulation or dosage 
compensation. Third, this finding cautions that elevated expres- 
sion of an amplified gene cannot alone be considered strong 
independent evidence of a candidate oncogene's role in tumor- 
igenesis. In our study, fully 62% of highly amplified genes 
demonstrated moderately or highly elevated expression. This 
highlights the importance of high-resolution mapping of ampli- 
con boundaries and shape [to identify the "driving" gene(s) 
within amplicons (16)], on a large number of samples, in addition 
to functional studies. Fourth, this finding suggests that analyzing 
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the genomic distribution of expressed genes, even within existing 
microarray gene expression data sets, may permit the inference 
of DNA copy number aberration, particularly aneuploidy (where 
gene expression can be averaged across large chromosomal 
regions; see Fig. 3 and supporting information). Fifth, this 
finding implies that a substantial portion of the phenotypic 
uniqueness (and by extension, the heterogeneity in clinical 
behavior) among patients* tumors may be traceable to underly- 
ing variation in DNA copy number. Sixth, this finding supports 
a possible role for widespread DNA copy number alteration in 
tumorigenesis (17, 18), beyond the amplification of specific 
oncogenes and deletion of specific tumor suppressor genes. 
Widespread DNA copy number alteration, and the concomitant 
widespread imbalance in gene expression, might disrupt critical 
stochiometric relationships in cell metabolism and physiology 
(e.g., proteosome, mitotic spindle), possibly promoting further 
chromosomal instability and directly contributing to tumor 
development or progression. Finally, our findings suggest the 
possibility of cancer therapies that exploit specific or global 
imbalances in gene expression in cancer. 

We thank the many members of the P.O.B. and D.B. labs for helpful 
discussions. J.R.P. was a Howard Hughes Medical Institute Physician 
Postdoctoral Fellow during a portion of this work. P.O.B. is a Howard 
Hughes Medical Institute Associate Investigator. This work was 
supported by grants from the National Institutes of Health, the Howard 
Hughes Medical Institute, the Norwegian Cancer Society, and the 
Norwegian Research Council 

T„ Eisen, M. B„ van de Rijn, M, Jeffrey. S. S„ et al (2001) Proc Nad. Acad. 
Set. USA 98, 10869-10874. 

10. Schuler, G. D. (1997) /. Mot Med. 75, 694-698. 

11. Lander, E. S., Linton, L. M, Birren, B„ Nusbaum, C Zody, M. C, Baldwin, 
J., Devon, K., Dewar, K., Doyle, M, FitzHugh, W, et aL (2001) Nature 
(London) 409, 860-921. 

12. Fejzo. M. S., Godfrey, T„ Chen, C, Waldman, F. & Gray, J. W. (1998) Genes 
Chromosomes Cancer 22, 105-113. 

13. Hughes, T. R, Roberts, C J, Dai, H, Jones, A. R, Meyer, M. R, Slade, D., 
Burcbard, J., Dow, S., Ward, T. R, Kidd, ML J~ et aL (2000) Nai. Genet. 25, 
333-337. 

14. Phillips, J. 1_, Hayward, S. W., Wang, Y-, VasseUi, J, Pavlovich, C. Padffla- 
Nash, H., Pezullo, J. R, Ghadimi, B. M, GrossfeM, G. D, Rivera. A-.etal. 
(2001) Cancer Res. 61, 8143-8149. 

15. Platzer, P., Upender, M. B„ Wilson, K, Willis, J, Lutterbaogb, J., Nosrati, A, 
Wfllson, J. K_, Mack, D., Ried, T. & Markowrtt, S. (2002) Cancer Res. 62, 
1134-1138. 

16. Aibertson. D. G., Ylstra. B., Segraves, R., Collins, C, Dairkee. S. H., Kowbel, 
D., Kuo, W. I_, Giuy, J. W. & Pinkel, D. (2000) Nat. Genet. 25, 
144-146. 

17. Li, R, Yerganian, G-, Duesberg. P, Kraemer, A., Wiuer, A, Rausch, C & 
Hchlmann. R. (1997) Pmc NalL Acad. Sci. USA 94, 14506-14511. 

18. Rasnick, D. & Duesberg, P. H. (1999) Buxhem. J. 348, 621-630. 



12968 | www.pnas.org/cgi/dol/10.1073/pnas.l62471999 



Pollack et at 



■Ill TECHNICAL UPDATE 

FROM YOUR LABORATORY SERVICES PROVIDER 

HER-2/neu Breast Cancer Predictive Testing 

Julie SanfordHanna, Ph.D. and Dan Mornin, M.D. 



Each year, over 182,000 women in the United States are 
diagnosed with breast cancer, and approximately 45,000 die 
of the disease. 1 Incidence appears to be increasing in the 
United States at a rate of roughly 2% per year. The reasons 
for the increase are unclear, but non-genetic risk faciors appear 
to play a large role. 2 

Five-year survival rates range from approximately 65%- 
85%, depending on demographic group, with a significant 
percentage of women experiencing recurrence of their cancer 
within 10 yean of diagnosis. One of the factors most predic- 
tive for recurrence once a diagnosis of breast cancer has been 
made is the number of axillary lymph nodes to which tumor 
has metastasized. Most node-positive women are given adju- 
vant therapy, which increases their survival. However, 20%- 
30% of patients without axillary node involvement also 
develop recurrent disease, and the difficulty lies in how to iden- 
tify this high-risk subset of patients. These patients could 
benefit from increased surveillance, early intervention, and 
treatment. 

Prognostic markers currently used in breast cancer recur- 
rence prediction include tumor size, histological grade, steroid 
hormone receptor status, DNA ploidy, proliferative index, and 
cathepsin D status. Expression of growth factor receptors and 
over-expression of the HER-2/neu oncogene have also been 
identified as having value regarding treatment regimen and 
prognosis. 

HER-2/neu (also known as c-erbB2) is an oncogene that 
encodes a transmembrane glycoprotein that is homologous 
to, but distinct from, the epidermal growth factor receptor. 
Numerous studies have indicated that high levels of expres- 
sion of this protein are associated with rapid rumor growth, 
certain forms of therapy resistance, and shorter disease-free 
survival. The gene has been shown to be amplified and/or 
overexpressed in 1 0%-30% of invasive breast cancers and in 
40%-60% of intraductal breast carcinoma. 3 

There are two distinct FDA-approved methods by which 
HER-2/neu status can be evaluated: immunphistochemistry 
(IHC, HercepTest™) and FISH (fluorescent in situ hybridiza- 
tion, PathVysion™ Kit). Both methods can be performed on 
archived and current specimens. The first method allows visual 
assessment of the amount of HER-2/neu protein present on 
the cell membrane. The latter method allows direct quantifi- 
cation of the level of gene amplification present in the tumor, 
enabling differentiation between low- versus high-amplifica- 
tidh. At least one study has demonstrated a difference in 



recurrence risk in women younger than 40 years of age for 
low- versus high-amplified tumors (54.5% compared to 
85.7%); this is compared to a recurrence rate of 16.7% for 
patients with no HER-2/neu gene amplification. 4 HER-2/neu 
status may be particularly important to establish in women with 
small 1 cm) tumor size. 

The choice of methodology for determination of HER-2/ 
neu status depends in part on the clinical setting. FDA approval 
for the Vysis FISH test was granted based on clinical trials 
involving 1 549 node-positive patients. Patients received one 
of three different treatments consisting of different doses of 
cyclophosphamide, Adriamycin, and 5-fluorouracil (CAF). 
The study showed that patients with amplified HER-2/neu 
benefited from treatment with higher doses of adriamycin- ' 
based therapy, while those with normal HER-2/neu levels did 
not. The study therefore identified a sub-set of women, who 
because they did not benefit from more aggressive treatment, 
did not need to be exposed to the associated side effects. In 
addition, other evidence indicates that HER-2/neu amplifica- 
tion in node-negative patients can be used as an independent 
prognostic indicator for early recurrence, recurrent disease at 
any time and disease-related death. 5 Demonstration of HER- 
2/neu gene amplification by FISH has also been shown to be 
of value in predicting response to chemotherapy in stage- 2 
breast cancer patients. 

Selection of patients for Herceptin 0 (Trastuzumab) mono- 
clonal antibody therapy, however, is based upon demonstra- 
tion of HER-2/neu protein overexpression using HercepTest™. 
Studies using Herceptin 0 in patients with metastatic breast 
cancer show an increase in time to disease progression, 
increased response rate to chemotherapeutic agents and a small 
increase in overall survival rate. The FISH assays have not yet 
been approved for this purpose, and studies looking at response 
to Herceptin 0 in patients with or without gene amplification 
status determined by FISH are in progress. 

In general, FISH and IHC results correlate well. However, 
subsets of tumors are found which show discordant results; 
i.e., protein overexpression without gene amplification or lack 
of protein overexpression with gene amplification. The clini- 
cal significance of such results is unclear. Based on the above 
considerations, HER-2/neu testing at SHMC/PAML will uti- 
lize immunohistochemistry (HercepTest 0 ) as a screen, fol- 
lowed by FISH in IHC-negative cases. Alternatively, either 
method may be ordered individually depending on the clini- 
cal setting or clinician preference. 
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CPT code information 

HER-2/neu via IHC 

88342 (including interpretive report) 

HER-2/neu via FISH 

88271 *2 Molecular cytogenetics, DNA probe, each 
88274 Molecular cytogenetics, interphase in situ hybrid- 

ization, analyze 25-99 cells 
8829 1 Cytogenetics and molecular cytogenetics, interpre- 
tation and report 



Procedural Information 

Immunohistochemistry is performed using the FDA-approved 
DAKO antibody kit, HerceptestC. The DAKO kit contains 
reagents required to complete a two-step immunohisto- 
chemical staining procedure for.routinely processed, paraffin- 
embedded specimens. Following incubation with the primary 
rabbit antibody to human HER-2/neu protein, the kit employs 
a ready-to-use dextran-based visualization reagent. This re- 
agent consists of both secondary goat anti-rabbit antibody 
molecules with horseradish peroxidase molecules linked to a 
common dextran polymer backbone, thus eliminating the need 
for sequential application of link antibody and peroxidase 
conjugated antibody. Enzymatic conversion ef the subse- 
quently added chromogen results in formation of visible 
reaction product at the antigen site. The specimen is then coun- 
terstained; a pathologist using light-microscopy interprets 
results. 

FISH analysis at SHMC/PAML is performed using the 
FDA-approved PathVysion™ HER-2/neu DNA probe kit, pro- 
duced by Vysis, Inc. Formalin fixed, paraffin-embedded breast 
tissue is processed using routine histological methods, and then 
slides are treated to allow hybridization of DNA probes to the 
nuclei present in the tissue section. The Pathvysion™ kit con- 
tains two direct-labeled DNA probes, one specific for the 
alphoid repetitive DNA (CEP 1 7, spectrum orange) present at 
the chromosome 17 centromere and the second for the HER- 
2/neu oncogene located at 1 7q 1 1 .2- 1 2 (spectrum green). Enu- 
meration of the probes allows a ratio of the number of copies 
of chromosome 17 to the number of copies of HER-2/neu to 
be obtained; this enables quantification of low versus high 
amplification levels, and allows an estimate of the percentage 
of cells with HER-2/neu gene amplification. The clinically 
relevant distinction is whether the gene amplification is due 
to increased gene copy number on the two chromosome 17 
homologues normally present or an increase in the number of 
chromosome 17s in the cells. In the majority of cases, ratio 
equivalents less than 2.0 are indicative of a normal/negative 
result, ratios of 2.1 and over indicate that amplification is 
present and to what degree. Interpretation of this data will be 
performed and reported from the Vysis-certified Cytogenet- 
ics laboratory at SHMC. 
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Genetic Instability in Epithelial 
Tissues at Risk for Cancer 
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Abstract: Epithelial tumors develop through a multistep process driven by 
genomic Instability frequently associated with etiologic agents such as pro- 
longed tobacco smoke exposure or human papilloma vims (HPV) infection. 
The purpose of the studies reported here was to examine the nature of genomic 
instability In epithelial tissues at cancer risk in order to identify tissue genetic 
biomarkers that might be used to. assess an Individual's cancer risk and 
response to chemopreventive intervention. As part of several chemoprevention 
trials, biopsies were obtained from risk tissues (Le^ bronchial biopsies from 
chronic smokers, oral or laryngeal biopsies from Individuals with premalig- 
nancy) and examined for chromosome Instability using in situ hybridization. 
Nearly all biopsy specimens show evidence for chromosome instability 
inrougnoul tne exposed tissue, increased cnromosome instability was observed 
with histologic progression In the normal to tumor transition of head and neck 
squamous cell carcinomas. Chromosome instability was also seen in premalig- 
nant head and neck lesions, and high levels were associated with subsequent 
tumor development In bronchial biopsies of current smokers, the level of 
ongoing chromosome instability correlated with .smoking intensity (eg., 
packs/day), whereas the chromosome index (average number of chromosome 
copies per cell) correlated with cumulative tobacco exposure (i.e., pack-years). 
Spatial chromosome analyses of the epithelium demonstrated multifocal clonal 
outgrowths. In former smokers, random chromosome Instability was reduced; 
however, clonal populations appeared to persist for many years, perhaps 
accounting for continued lung cancer risk following smoking cessation. 

Keywords: chromosome instability; epithelial cells; aerodlgestive tract; 
chemoprevention; cancer risk 



THE NEED FOR BIOMARKERS OF CANCER RISK AND 
RESPONSE TO INTERVENTION 

Epithelial cancers remain a major health challenge in the world. Despite improve- 
ments in staging and the application and integration of surgery, radiotherapy, and 

chemotherapy, the 5-year survival rate for individuals with lung cancer is only about 
15%.' Even if strategies for early detection are successful and lung cancers 
are detected at a stage where local tumor resection and treatment is curative, 
these patients will still be at significant risk for developing second primary tumors 
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associated with the problem' of field cancerization. 2 Similarly, for individuals with a 
first head and neck primary tumor, even if the first malignancy is successfully treat- 
ed, the risk of developing a second primary in the tobacco smoke-exposed field is 
approximately 40%. 3 Similar cancer risk estimates exist for individuals who exhibit 
severe dysplasia in premalignant epithelial lesions. 4 For these reasons, it is important 
to focus on chemopreventive strategies to prevent the development of epithelial 
malignancies. 

Several problems confront chemoprevention trials designed to identify effica- 
cious agents. 5 First, chemoprevention trials with cancer incidence as a primary end' 
point require tens of thousands of subjects and tens of years of intervention and 
follow-up for statistical evaluation. For example, a recently reported trial involved 
30,000 subjects and required 10 years in order to examine the impact of prevention 
strategies on lung cancer development, only to find a possible increased lung cancer 
incidence in current smokers who received ^-carotene. 6 

The problem of large, long-term trials results from the difficulty in identifying 
individuals at highest cancer risk who might best benefit from chemopreventive 
intervention. For example, 20 pack-year smokers, while known to be at relatively 
increased risk for developing lung cancer, have approximately a 10% lifetime risk 
for developing lung cancer. This seriously limits the number of potentially useful 
strategies that c an b e cl i nically explored A s cr /mrl prnhl em faring rt if ninprfrvention 
trials is that little is known about what agents are likely to have efficacy, and even 
less is known regarding proper doses, schedules, and durations of treatment. Part of 
the reason for this problem is that too little is known about the physiologic processes 
that drive epithelial cancer development 

In order to reduce the number of subjects and the time required to cany out 
chemoprevention trials and thus allow the exploration of multiple prevention strate- 
gies, two types of advances are necessary. First, it is important to identify individuals 
at significantly increased cancer risk who might best benefit from different types of 
intervention. Second, in order to allow the rapid identification of agents, doses, and 
schedules of potentially efficacious agents, it is necessary to identify and validate 
surrogate endpoints of response that indicate whether the agents are having a posi- 
tive impact on the target tissue during the chemopreventive intervention. 

One approach to identifying individuals at increased aerodigestive tract cancer 
risk is to explore epidemiologic features of potential subjects. Molecular epidemio- 
logic studies are beginning to identify intrinsic host factors that place some individ- 
uals at increased cancer risk, especially those with a chronic smoking history. 8 Most 
intrinsic factors identified thus far reflect levels of carcinogen metabolism, repair 
capabilities of the host following DNA damage, and other measures of intrinsic 
"cellular sensitivity to mutagens. WhilethesTfactors can provide statistically signif- 
. icant risk ratios in case-control studies that are controlled for tobacco exposure, the 
detected risk ratios usually fall in the range of 1.5 to 10. Unfortunately, this is not 
sufficient for the individualization of treatment and is not sufficiently high to signif- 
icantly reduce the numbers of subjects required for chemoprevention trials with 
cancer incidence as the primary endpoint. 

Another approach to identifying individuals at increased cancer risk is to directly 
examine the target tissue of individuals with known carcinogen exposure (e.g., 
chronic tobacco smoke exposure), who have evidence of target organ dysfunction 
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(e.g., chronic obstructive pulmonary disease, changes in voice quality), or who 
have clinical evidence of preraalignancy (e.g., bronchial metaplasia/dysplasia, oral 
leukoplakia/erythroplakia, cervical intraepithelial neoplasia). The conventional 
standard for assessing cancer risk in these situations is the degree of histological 
change. However, while individuals who show moderate to severe dysplasia are 
known to be at increased cancer risk when compared to individuals with lesser his- 
toldgic changes, it is often difficult to distinguish reactive changes to carcinogenic 
insult from initiated and progressing lesions. Similarly, upon cessation of carcino- 
genic insult, histologic changes may reverse yet cancer risk may continue for many 
years. For example, while smoking cessation is associated with decreased bronchial 
metaplasia; 9 increased lung cancer risk continues for many years beyond smoking 
cessation. 10 In fact, nearly half the newly diagnosed lung cancer cases in the USA 
occur in former smokers.' 1 

The development of assays to identify individuals at high epithelial cancer risk 
and to directly assess response to intervention in the target tissue is therefore an 
important research goal. Such assays should be objective and easily quantifiable and, 
if possible, minimally invasive. Moreover, they should reflect both the disease pro- 
cess and the targeted pathway and thereby be useful in assessing risk and monitoring 
response to intervention as well as directly testing the hypothesized mechanism of 
-action-o f the c h emopr eventive strategy. 

In the chemoprevention setting it is important to recognize that one does not 
know the location of the future cancer. Thus, assays must necessarily be carried out 
on random biopsies of the field at risk. Even if there are clinically evident premalig- 
nant lesions, this does not mean that this is the likely site for a future malignancy. 
For example, nearly half of the cancers that develop in individuals with oral leuko- 
plakia arise away from the original index lesion. Similarly, since many newly diag- 
nosed lung cancers arise in the peripheral parts of the lung (e.g., adenocarcinomas), 
especially in former smokers, and since endobronchoscopy predominantly accesses 
central components of the lung, it is important to identify biomarkers that can reflect 
global processes ongoing in the target epithelial field associated with increased can- 
cer risk. Their discovery requires a better understanding of the tumorigenesis pro- 
cess in epithelial fields at cancer risk. 



THE RATIONALE FOR STUDYING 
GENOMIC INSTABILITY AS A MARKER OF RISK 

-—Tumors of the aerodigestiw tract have been prorwsedloleflectia "field canceriza^ 
Hon" process whereby the whole tissue is exposed to carcinogenic insult (e.g., tob- 
acco smoke) and is at increased risk for multistep tumor development 12 ' 13 Several 
types of clinical and laboratory data support this notion, including the frequent 
occurrence of synchronous primary and subsequent second primary tumors in the 
aerodigestive tract (frequendy exhibiting dissimilar histologies as well as distinct 
genetic signatures 14 " 1 ?) and the presence of premalignant lesions that precede and/or 
accompany the tumor in the exposed tissue field. 17 The notion of a multistep tumor- 
igenesis process is further supported by serial clinical and histologic evaluations of 
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target tissue Or exfoliated cells where increasing degrees of histological abnormali- 
ties are observed over time. 18 

A working model for aerodigestive tract tumorigenesis is illustrated in Figure 1 . 
Tumorigenesis in the face of carcinogenic exposure likely involves a chronic process 
of tissue injury and wound healing. DNA damage induced by the carcinogen is likely 
fixed into permanent genetic changes (e.g., chromosome damage, chromosome non- 
disjunction, gene mutation, gene deletion, etc.) during the process of proliferation. 
This damage would be expected to be distributed throughout the exposed tissue field 
leading to a background of generalized genomic damage (depicted in Figure 1 as a 
background mat of increasing density). Chronic injury and repair likely leads to the 
accumulation of cells with increasing amounts of genetic changes as well as the out- 
growth of abnormal clones (triangles in Figure 1) carrying an accumulation of 
genetic changes important for selective survival, dysregulated growth, and preferen- 
tial epithelial take-over by initiated clones (see Figure 2). 

Cellular and molecular evidence for the field carcinogenesis and muhistep tum- 
origenesis model comes from many laboratories. 19,20 With the advent of a wide array 
of molecular technologies, a large number of specific molecular genetic and epige- 
netic changes involving specific oncogenes, tumor suppressor genes, cell regulatory 
genes, and repair genes have now been described for aerodigestive tract cancers. The 
identific a ti o n of these specific molecular changes have now provided probes to 



explore specific events occurring in premalignant lesions adjacent to aerodigestive 
tract tumors. 21 - 24 Frequently, these premalignaht lesions showed a subset of the 
same molecular changes found in the associated tumor, suggesting that these lesions 
might represent precursor lesions for the associated tumors (i.e., a manifestation of 
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FIGURE 1. Field cancerization and multistep tumorigenesis. 
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FIGURE 2. Multiple focal clonal evolution daring multistep tumorigenesis. 



a multistep tumorigenesis process). For example, studies of the premab'gnant lesions 
adjacent to head and neck tumors have provided evidence for a gradual accumulation 
of genetic alterations accompanied by evidence for dysregulation of cellular control 
mechanisms (e.g., alterations in expression of PCNA, EGFR, TGF-p\ p53, and 
cyclin Dl). 25 " 28 

These types of studies have now also been applied to the target epithelium of indi- 
viduals at increased risk for aerodigestive tract cancer (i.e., individuals with a chron- 
ic smoking/alcohol history and/or prior aerodigestive tract cancer). Several groups 
(using polymerase chain reaction, PCR, analysis of microdissected epithelium) have 
now demonstrated the presence of clonal outgrowths in the target premalignant epi- 
' theliura of individuals at increased risk for cancer. 2 ? -31 For example, examination of 
bronchial biopsies derived from individuals with a 20 pack-year smoking history 
demonstrated that 76% of the cases showed evidence for LOH (3pl4, 9p21, or 
1 7p 1 3) in at least one of six lung biopsy sites. On a per site basis, some form of LOH 
was observed in 25% of the sites examined. 29 

•If aerodigestive tract cancer development reflects a field cancerization process 
"involving multistep eventsTthen risk~and response information should beableto~be^ 
derived from random biopsies or exfoliated cells from the field at risk or from assess- 
ments of tissue undergoing similar processes. Hypothetically, lesions exhibiting the 
greatest degree of genomic instability, clonal outgrowth, and abnormal epithelial 
regulation would be at the highest relative aerodigestive tract cancer risk. Similarly, 
an active chemopreventive intervention might be expected to decrease these mani- 
festations of risk. Reduced risk manifestations include decreased levels of ongoing 
genetic instability, decreased frequency of clonal outgrowths, arid increased epithe- 
lial growth regulation. 
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THE MEASUREMENT OF CHROMOSOME INSTABILITY USING 
CHROMOSOME IN SITU HYBRIDIZATION 

Molecular genetic techniques, while extremely useful for detecting clonal chang- 
es in targets tissues, are somewhat limited in their ability to detect random genetic 
instability. Conventional cytogenetic assays are useful for detecting chromosome 
instability and clonal chromosome changes. However, they require numbers of 
dividing cells for karyotypic analysis that are difficult to attain in the setting of biop- 
sies acquired during the course of a chemoprevention trial. A technique was there-: 
fore needed' that would allow chromosome instability measurements in situations 
where few cells are available (e.g. small biopsies, brushings, or sputum samples) and 
where the target material might be fixed. It was also desirable to have a technique 
that would be adaptable to tissue sections, whereby spatial information could be 
retained and genotype/phenotype associations could be determined on the same or 
adjacent tissue sections. The technique of in situ hybridization (ISH) involves the 
use of DMA probes that recognize either chromosome-specific repetitive target 
sequences, chromosome single gene copy sequences, or sequences along the whole 
chromosome length or chromosome segments. 32 We have adapted the ISH technique 
for formalin-fixed, paraffin-embedded tissue sections and have applied it to a variety 
of tissues, including the aerodjggsrivg tract. 33,34 

Using probes that label the centromere regions of specific chromosomes, this 
assay permits determination of the average chromosome number per cell for each 
specimen. This assay is also useful for detecting generalized chromosome instability 
during the rumorigenesis process. Normal diploid populations should have two cop- 
ies of each autosomal chromosome and should rarely show three or more chromo- 
some copies per cell (chromosome polysomy), especially in tissue sections where 
nuclear truncation results in an imder-representation of chromosome copy number. 
Thus, the detection of cells with three or more chromosome copies would indicate 
the presence of chromosome instability. 

To examine this technique's potential for characterizing the multistep tumorigen- 
esis process in the aerodigestive tract, we measured the fraction of cells exhibiting 
three or more chromosome copies in apparently contiguous epithelial transitions 
from normal to hyperplastic to dysplastic to carcinomas, all on a single tissue slice 
of head and neck squamous cell carcinomas. 34 In these specimens, greater than 35% 
of the cases of adjacent "normal" epithelium, greater than 65% of the cases of hyper- 
plastic epithelium, and greater than 95% of the dysplastic and tumor regions showed 
evidence of chromosome polysomy. Of interest, similar transitions of chromosome 
instability were observed with at least four different chromosome probes. Similar 
tren ds ha ve also been o bserved in amenabl e tissue fr om other epithelial mali gnan- 
cies, including cervix, bladder, and breast 35 These results thus suggested that the 
notions of field cancerization and multistep rumorigenesis might apply to several 
epithelial tissues and that measures of chromosome instability might be useful for 
monitoring this process. 

. In the situations described above, the premalignant lesions examined might be 
considered to represent epithelium at 100% risk of being in a cancer field, since they 
were located in the adjacent epithelium to the cancer. This then raises the question 
of the nature of genetic instability in the epithelium of individuals at increased risk 
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for developing cancer. To explore this issue, we obtained biopsies during the course 
of leukoplakia chemoprevention trials exploring the use of 13-cu-retinoic acid in 
reversing leukoplakia and probed them for genetic instability using in situ hybridiza- 
tion. Intone retrospective study and in one prospective study of subjects with oral' 
leukoplakia, the results indicate that those subjects whose pretreatment biopsies har- 
bor relatively high levels of genomic instability (i.e., more than 3% of the cells 
examined showing at least 3 chromosome 9 copies per cell) have a significantly 
higher likelihood df suffering early onset of head and neck cancer. 36,37 Interestingly, 
half of the tumors that did develop occurred away from the biopsy site used to mea- 
sure genetic instability. This result suggests that genomic instability measurements 
in carcinogen-exposed tissue can provide useful cancer risk estimates. 



THE RELATIONSHIP BETWEEN TOBACCO EXPOSURE AND 
CHROMOSOME INSTABILITY 

In recent years, the aerodigestive tract chemoprevention group at M.D. Anderson 
Cancer Center has initiated three sequential biomarker-associated chemoprevention 
trials involving chronic smokers with a greater than 20 pack-year smoking history. 
-la-eaehhobhes^tudiesr^dotiranchiaWcpsi^^ 

within the lung, including the carina and at bifurcation points at the upper, middle, 
and lower right lung and at the upper and lower left lung. Biopsies were obtained pri- 
or to and following chemopreventive intervention and were subjected to iti situ 
hybridization analysis in addition to analyses for other biomarkers.The first impor- 
tant rinding was that some degree of chromosome polysomy was evident in all lung 
sites examined, and this was observed independently of the particular chromosome 
probe utilized. 38 This finding supports the notion that random chromosome changes 
may be occurring throughout the exposed lung field. 

In a second study, bronchial biopsies were obtained from individuals with a 20 
pack-year smoking history. In this study, most of the subjects involved were current 
smokers. 39 Interestingly, all cases who showed metaplasia at one of six biopsy sites 
also showed chromosome polysomy in at least one biopsy site; overall, 88% of the 
sites showed some evidence of chromosome 9 polysomy. 40 Evidence for genetic 
instability was also detected in patients who did not show evidence of bronchial 
metaplasia in any of six biopsy sites despite a strong smoking history. In fact, more 
than 90% of the cases and more than 60% of the sites showed significant chromo- 
some polysomy (i.e., at least three copies in at least 2 % of the cells examined). 
These results suggest that the lungs of long-term smokers show significant evidence 
of g enetic instab ility, and this instability can be detected throughout the accessible 
bronchial tree, even when bronchial metaplasia is not evident. . 

These studies in current smokers has allowed us to examine the relationship 
between the levels of genetic instability detected and subject characteristics such as 
smoking status (current or former), smoking history, and lung tissue pathologic 
changes. Evaluable biopsy material has now been obtained from more than 108 cur- 
rent smokers, including more than 480 evaluable biopsy sites. The mean metaplasia 
index in these current smokers was 30.4%. For the total population studied, the 
median chromosome index for the bronchial biopsies was 1.41 (range, 1.04-1.61) 
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and the median chromosome polysomy index was 2.0% (range 0-8.7%). This can be 
compared to a mean chromosome index between 1.2-1.4 for lymphocytes and very 
rare chromosome polysomy. Interestingly, the intrasubject variability in chromo- 
some instability was relatively low in most subjects and was less than the intersub- 
ject variability. These results suggested that chronic smokers harbor detectable 
chromosome instability throughout the accessible bronchial tree (supporting the 
■ field carcinogenesis notion) and that information from one biopsy site might yield 
representative information for the rest of the lung field. 

Since most of the current smokers exhibited bronchial metaplasia in at least one 
of the biopsied sites, this allowed us to examine the relationship between chromo- 
some instability and histologic changes, both on a site-by-site basis and on a per case 
basis. On a site-by-site basis, the chromosome indices of lesions showing squamous 
metaplasia were similar to those not showing metaplasia (i.e., median 1.43 vs. 1.43), 
arid the degree of chromosome polysomy in metaplastic lesions were only slightly 
higher than in non-metaplastic sites (medians: 2.2% vs. 1.8%, respectively). Thus, 
the presence or absence of squamous metaplasia at a biopsy site does not necessarily 
correlate with the degree of underlying genomic instability. On the other hand, those 
subjects with metaplasia indices of at least 15% also showed higher levels of chro- 
mosome polysomy than did subjects with metaplasia index below 15% (medians: 
2.4% VS. 1.8%. P = 0.005). Thus, these chromosom e instability assessments in cur- 
rent smokers appeared to reflect a more global process in the lung field. 

Tobacco exposure has been shown to significantly increase the risk of developing 
lung cancer, and the degree of risk is related to the extent of tobacco exposure. We 
were interested in determining the relationship between individuals' smoking histo- 
ry parameters and the levels of chromosome change found in their lungs following 
years of tobacco exposure. While there was significant intersubject variation for sim- 
ilar tobacco exposure histories, overall there was a significant correlation between 
the degree of chromosome polysomy and the intensity of ongoing tobacco exposure 
(packs/day, p = 0.02 on a per site basis) and with the extent of tobacco exposure 
(pack-years, p = 0.003). Thus the amount of chromosome polysomy reflects the 
intensity and extent of tobacco exposure. At the same time, individuals with similar 
smoking histories showed widely divergent amounts of chromosome polysomy, pos- 
sibly reflecting differences in intrinsic sensitivity between subjects. There was also 
strong correlation between the chromosome index and the duration of the smoking 
history (smoking years) and total accumulated exposure (pack-years, p = 0.0001). 
These results suggest that tobacco exposure is associated with the initiation and 
accumulation of chromosome instability in the exposed lung; however individuals 
are differentially sensitive to carcinogenic insult The working hypothesis is that 

those individuals who accumulate the hi g hest degree of chromosome changes will 

be at the highest lung cancer risk. 

Many of the bronchial biopsies from chronic smokers examined by in riftt hybrid- 
ization showed a rise in the chromosome index above that expected for a diploid cell 
population, especially in subjects with an extensive smoking history. The rise in 
chromosome index was also accompanied by an increase in the fraction of cells 
exhibiting at least 3 chromosome copies per cell. To determine if a rise in the tissue 
chromosome index was due to clonal expansion of populations with chromosome tri- 
somy, the chromosome copy number and relative coordinates of each cell scored in 
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the bronchial epithelium was recorded and a spatial genetic map was created. 41 We 
then developed algorithms for calculating localized chromosome indices within the 
tissue. Since trisomic clones would have, on average, three chromosomes instead of 
two, thpse cells involved in neighborhoods with chromosome indices three-halves 
that of diploid populations could be marked as being part of a trisomic clone. Simi- 
. larly, groups of cells with chromosome indices half that of diploid populations could 
be marked as being part of a monosomic clone. This allowed the generation of a sec- 
ond-order, two-dimensional genetic map representation of the bronchial epithelium 
showing the relative locations of cells involved in monosomic and trisomic clonal 
outgrowths. When adjacent tissue sections from the same bronchial biopsy were 
probed separately for different chromosomes, the detected clones appeared to occu- 
py separate subregions of the epithelium. This result suggests that not only are the 
lungs of chronic smokers undergoing a process of genetic instability, they are expe- 
riencing the outgrowth of multiple clones throughout the exposed lung field, as pos- 
tulated by the models shown in Figures 1 and 2. One advantage of this clonal 
approach is that the contribution of both monosomic and multisomic clones can be 
detected. 

Since smoking cessation has been suggested to reduce the lung cancer risk, it was 
of interest to determine whether the levels of chromosome instability would decrease 
following smoking cessation. This question was possible to examine because our 
third sequential chemoprevention trial involved subjects who had discontinued 
smoking. So far, more than 220 subjects (more than 650 biopsies) who have quit 
smoking (mean 9.9 quit-years) have been evaluated for chromosome instability in 
their lungs. Despite the fact that the mean metaplasia index in this group is 5.8% 
(considerably less than that in current smokers), chromosome instability is still 
observed in the majority of subjects. 42 While the mean chromosome polysomy level 
is reduced to 1.0%, some individuals continue to show polysomy levels above 5%. 
Interestingly, while the overall chromosome polysomy levels were reduced in these 
individuals who stopped smoking, the mean chromosome index remained at about 
1 .4 with some individuals exhibiting chromosome indices as high as 1 .8. Initial chro- 
mosome mapping studies suggest that while random chromosome instability seems 
to decrease following smoking cessation, the clonal outgrowths may remain for 
many years in the lung. The working hypothesis is that those individuals who show 
the greatest degree of remaining Chromosome instability are at the highest lung can- 
cer risk despite smoking cessation. Long-term follow-up on these subjects will be 
necessary to test this hypothesis. 



SUMMARY AND CONCLUSIONS 

Aerodigestive tract tumorigenesis appears to be a multistep process taking place 
throughout the tissue fields of exposure. When viewed in the context of chromosome 
changes, carcinogen exposure appears to be associated with the random acquisition 
of chromosome polysomy throughout the exposed field, the degree of which is relat- 
ed to the degree and extent of carcinogen exposure as well as to the instrinsic suscep- 
tibility of the exposed individual. Continued exposure leads to continued acquisition 
of new changes and, in association with chronic wound-healing processes, to the 
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accumulation of clonal outgrowths throughout the target tissue. Although the ulti- 
mate malignancy may occur in only one or few tissue sites, manifestations of the 
instability process that drives tumorigenesis is globally present in the tissue. Thus 
random biopsies may provide useful risk information for the exposed field as a 
Whole. Even when carcinogen exposure is reduced or chemopreventive strategies are 
initiated and histologic manifestations of the tumorigenesis process subside, the 
genetic scars of prior exposure remain in the form of clonal outgrowths and may 
explain continued lung cancer risk in ex-smokers. Future chemoprevention strategies 
need to focus on reducing the degree of chromosome instability and on trying to 
eliminate residual abnormal clonal outgrowths in the aerodigesuVe tract In this set- 
ting, the measurement of chromosome instability in the target tissue will be useful in 
assessing cancer risk as well as response to intervention. 



The studies reviewed here represent one component of the collaborative efforts 
of the Aerodigestive Tract Chemoprevention team at The University of Texas M.D. 
Anderson Cancer Center, Houston, Texas. The studies were supported in part by 
^atienaHflstitutwofHealth-{^den^Ganeer4nstituteQranteGA^20Mr€A-6843-7 T 
CA 79437, CA 16672, CA 68089, CN 25433, CA 86390. CA 70907, NIH DE 13 157, 
and the State of Texas Tobacco Research Fund. 
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Abstract 

iiarly identification and subsequent intervention are 
needed to decrease the high mortality rate associated with 
lung cancer. The examination of bronchial epithelium for 
genetic changes could be a valuable approach to identify 
individuals at greatest risk. The purpose of this 
investigation was to assay cells recovered from 
nonmalignant bronchial epithelium by fluorescence in situ 
hybridization for trisomy of chromosome 7, an alteration 
common in non-small cell lung cancer. Bronchial 
epithelium was collected during bronchoscopy from 16 
cigarette smokers undergoing clinical evaluation for 
possible lung cancer and from seven individuals with a 
prior history of underground uranium mining. Normal 
bronchial epithelium was obtained from individuals 
without a prior history of smoking (never smokers). 
Bronchial cells were collected from a segmental bronchus 
in up to four different lung lobes for cytology and tissue 
culture. Twelve of 16 smokers were diagnosed with lung 
cancer. Cytological changes found in bronchial epithelium 
included squamous metaplasia, hyperplasia, and atypical 
glandular cellSi-These changes-were present in 33,42, and 
47% of sites from lung cancer patients, smokers, and 
former uranium miners, respectively. Less than 10% of 
cells recovered from the diagnotic brush had cytological 
changes, and in several cases, these changes were present 
within different lobes from the same patient Background 
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frequencies for trisomy 7 were 1.4 ± 03% in bronchial 
epithelial cells from never smokers. Eighteen of 42 
bronchial sites from lung cancer patients showed 
significantly elevated frequencies of trisomy 7 compared 
to never smoker controls. Six of the sites positive for 
trisomy 7 also contained cytological abnormalities. 
Trisomy 7 was found in six of seven patients diagnosed 
with squamous ceil carcinoma, one of one patient with 
adenosquamous cell carcinoma, but in only one of four 
patients with adenocarcinoma. A significant increase in 
trisomy 7 frequency was detected in cytologically normal 
bronchial epithelium collected from four sites in one 
cancer-free smoker, whereas epithelium from the other 
smokers did not contain this chromosome abnormality. 
Finally, trisomy 7 was observed in almost half of the 
former uranium miners; three of seven sites positive for 
trisomy 7 also exhibited hyperplasia. Two of the former 
uranium miners who were positive for trisomy 7 
developed squamous cell carcinoma 2 years after 
collection of bronchial ceils. To determine whether the 
increased frequency of trisomy 7 reflects generalized 
anetiploidy or specific chromosomal duplication, a 
subgroup of samples was evaluated for trisomy of 
chromosome 2; the frequency was not elevated in any 
of the cases as compared with controls. The studies 
described in this report are the first to detect and 
quantify the presence of trisomy 7 in subjects at risk for 
lung cancer. These results also demonstrate the ability to 
detect genetic changes in cytologically normal cells, 
suggesting that molecular analyses may enhance the 
power for detecting premalignant changes in bronchial 
epithelium in high-risk individuals. 

Introduction 

Although lung cancer is the leadi ng cause of cancer death in the 
United-Stato-(l),-e^ly_detection_and_intervention.could^de-^ 
crease the high mortality rate associated with this disease if 
sensitive screening approaches could be developed (2-4). Early 
detection may be feasible because the entire respiratory tract is 
exposed to inhaled carcinogens; therefore, the whole lung is at 
risk for developing multiple, independently initiated sites. This 
"field cancerization" condition (5) is supported clinically by a 
high frequency of second primary rumors in lung cancer pa- 
tients (6-9) and by the occurrence of progressive histological 
premalignant changes throughout the lower respiratory tract of 
cigarette smokers (10, 11). Moreover, recent studies using 
pathological tissues obtained after lung resection or autopsy 
have identified genetic aberrations associated with lung cancer 
in nonmalignant bronchial epithelium adjacent to tumors (12- 
16). 

Although examination of pathological samples is useful 
for identifying genetic changes associated with carcinogenesis, 
this invasive approach for collection of clinical samples nec- 
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essary for early detection would not be appropriate for screen- 
ing. However, bronchial epithelial cells harvested using routine 
clinical procedures could be examined for genetic changes as an 
initial approach for detecting individuals at high risk for lung 
cancer. This approach could also provide genetic markers for 
evaluating the effectiveness of chemoprevention regimens. 
Bronchoscopy provides direct access to viable ceils within the 
airways and is a commonly used tool for obtaining samples 
from the lower respiratory tract, including bronchia! epithelium 
(17). This procedure can be used to repeatedly sample the 
bronchial epithelium over time and to collect viable cells that 
can be expanded through tissue culture for functional assays. 

Because of field cancerization, genetic abnormalities 
should be dispersed throughout the bronchial epithelium of 
persons at risk for lung cancer. The purpose of this investiga- 
tion was to test this hypothesis by sampling nonmalignant 
bronchial epithelium from distinct locations within four differ- 
ent lobes of the lung from persons at risk for lung cancer and 
then assaying the bronchial cells for the presence of specific 
genetic abnormalities. Trisomy of chromosome 7 was exam- 
ined in these cells, because this alteration is common in solid 
tumors, including lung cancer, of several different organ sys- 
tems (18, 19). In addition, trisomy 7 has been detected in 
premalignant lesions such as villous adenoma of the colon (20), 
in the colonic mucosa of individuals with familial polyposis 
(21), and in the far margins of some resected lung tumors (22). 
Our results demonstrate that trisomy 7 can be detected in 
nonmalignant bronchia! epithelium from patients with lung 
cancer distant to the site of the tumor and in individuals without 
tumors who are at high risk for lung cancer development. 
Together, these studies suggest that an extra copy of chromo- 
some 7 may be an intermediate biomarker of ongoing field 
carcinogenesis. 

Materials and Methods 

Subject Recruitment. Bronchial epithelium was collected 
from 16 cigarette smokers undergoing a diagnostic workup for 
possible lung cancer and from 7 individuals with a prior history 
of underground uranium mining, 5 of whom were also smokers. 
Three individuals who had never smoked were also recruited to 
obtain bronchial epithelium not exposed directly to either to- 
bacco smoke or radon progeny. 

Pathology and Exposure History. Twelve of the 16 cigarette 
smokers who underwent diagnostic bronchoscopy were diag- 
nosed with NSCLC. 3 Seven tumors were characterized histo- 
logically as SCCs, four tumors were ACs, and one tumor was 
an adenosquamous relljcarcinoma^ Lung cancer was not evi- 
dent in the other four subjects! Smoking histories ranged from 
15 to 120 pack-years (defined as the number of cigarettes 
smoked per day times the number of years smoked). All of the 
former uranium miners worked underground between 2 and 20 
years, with a range of 27-527 working level months. Five of the 
seven miners had smoking histories that ranged from 20-60 
pack-years. 

Bronchoscope Collection and Processing of Bronchial Ep- 
ithelium. A protocol was developed for harvesting viable 
bronchial epithelium from the lower respiratory tract using a 
standard cytology brush during bronchoscopy. After introduc- 
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of a segmental orifice, usually the second and third bifurcation 
within the upper and lower lobes, respectively, was brushed 
These sites were chosen because («) they are high-deposition 
areas for particles; (b) they are associated frequently with 
histological changes in smokers; and (c) lhey represent sites 
where tumors commonly occur (11, 23). The area was first 
washed with saline to remove any nonadherent cells. Sites were 
not brushed if a tumor was visualized within 5 cm of the site. 
After brushing, the brush was withdrawn, placed in serum-free 
medium, and kept on ice until processed. Each site was brushed 
twice. The procedure was well tolerated by all subjects, and no 
complications were noted related to the brushing procedure. 

Bronchial cells were collected from only two of the sites 
in two of the subjects, from three sites in two subjects, and from 
all four sites in the remaining subjects. Although only two sites 
were brushed initially in case 1, cells were obtained from all 
four sites in this subject during a repeat bronchoscopy per- 
formed after the initial procedure did not yield a diagnosis. 
Samples were obtained from all four sites in the cancer-free 
current smokers and in the never smokers. In addition, bron- 
chial epithelial cells derived at autopsy by Clonetics, Inc. (San 
Diego, CA) from four never smokers were also obtained to 
serve as additional controls. Only two sites sampled from most 
of the former uranium miners were available for analysis be^ 
cause cells recovered from the other sites had been used ex^ 
clusively for cytology in another investigation.' 5 
Bronchial Epithelial Cell Culture. Replicative cultures of (be : : 
bronchial epithelial cells obtained by the procedure described!' 
above were established in our laboratory (24) using a scrums A 
free medium (BEGM; Clonetics, Inc.) that is optima: for growths 
of these cells. Cells were removed from brushes by vigorous! 
shaking in BEGM; cells from one brush were prepared fbr.3 
cytological analyses, and cells from the other brush were^ 
washed, resuspended in BEGM, seeded onto 60-mm fibronesjj- 
tin-coated plates, and grown at 37'C in 3% CO z and 21% Oj. | 
until 80% confluence. Prior to passage, aliquots of cells wete|| 
cryopreserved and stored at — 145°C; other samples of ccfls=| 
were fixed in methanol-acetic acid (3:1). Next, the cells wensgj 
washed four to six times in methanol :acetic acid and ther|f 
dropped onto slides (about 2 X 10 5 cclls/slide). The effects oft 
cell culture on the frequency of trisomy 7 in nonmalignanE* 
bronchial epithelium were examined by placing cells dispersed?! 
from brushes directly onto microscope slides followed by fixsjj 
ation. 

Cytology. Cells from one brush from each bronchial coilcctibi^p 
site wereprepared for cy tologicalanalysisby smearing-the celKg 
across a microscope slide. The cells were then fixed with 96«|| 
cthanol and stained according to the Papanicolaou procedural 
(25) to facilitate morphological evaluation by a cytopathologis^ 
Detection of Trisomy 2 and Trisomy 7. Trisomy 2 and uk\ 
somy 7 were determined by hybridization of cells with a'biorj^| 
nylated chromosome 2 or 7 centromere probe (Oncor; Gaith^ 
crsburg, MD). The probes were denatured in hybridizan'oil 
buffer at 70°C for 5 min, and the slides were immersed in 
formamide-2X SSPE at 70°C for 2 min. The probe was A0jg 
applied to the slides, which were incubated in a hujrodUw| 
chamber at 37°C for 16 h. After incubation, the slides wefl| 
washed in 0.25X SSPE (10 mvt sodium phosphate monobas^ 
monohydrate; 1 mM ethyienediamine telraacetic acid disodtu« 



' The abbreviations used are: NSCLC. non-small cell lung cancer SCC, squa- 
mous cell cancer. AC. adenocarcinoma; EGFR. epidermal growth factor receptor: 
FISH, fluorescence in situ hybridization: LOH. loss of helero/.ygosity; BEGM. 
Bronchial Epithelium Growth Medium. 



4 Unpublished data. 
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salt, dihydrate; 150 dim sodium chloride, pH 7.4) for 5 rain at 
72°C, and the probe was detected with fluoiescein-labeled 
avidin. Cell nuclei were visualized with propidium iodide. 
Data Analysis. The number of centromeric hybridization sig- 
nals in each cell were evaluated in 400 cells/slide, and the 
frequency of" trisomy 7 on each slide was calculated by dividing 
the total number of cells expressing three hybridization signals 
by the total number of cells counted on each slide. Twenty % 
of the slides were scored by a second person, and frequencies 
for trisomy 7 differed by <0.4%. The total number of sites 
positive for trisomy 7 in subjects with SCC and AC were 
compared using Fisher's exact test 

Results 

Cytology. Squamous metaplasia and atypical glandular cells, 
the only cytological abnormalities observed in lung cancer 
patients, were present in 32% of the samples (Table 1). These 
cytological changes were observed in <10% of the cells re- 
covered from the diagnostic brush. Two subjects had three sites 
with cytological abnormalities, and five subjects had no cyto- 
logical abnormalities. No samples contained tumor cells by 
cytology, although one of four sites in five subjects was col- 
lected from the same lobe where a tumor was later diagnosed. 

Two of the 16 sites in smokers without lung cancer were 
cytologically abnormal (both in the same person; Table 2), 
whereas no atypical cells were present in the 12 sites from the 
three never smokers (Table 3). In former uranium miners, 
hyperplasia was present in bronchial cells collected from all 
four sites from one person, and in one site in two additional 
people (Table 2). 

Culturing of Bronchial Epithelial Cells. The efficiency of 
establishing replicative cultures of the cells obtained by bron- 
chial brushing was 100%. The serum-free medium used for 
these cultures is optimal for growing bronchial epithelial cells 
and does not support fibroblastic cell replication (25). There- 
fore, the cells were uniformly epitheloid in appearance. Growth 
potential was evaluated by passaging cells from all seven of the 
uranium miner cases and cases 1-6 from the lung cancer 
patients. Some of these cultures were maintained for up to nine 
passages (a minimum of 16 population doublings), and many 
underwent 30 divisions before senescence. However, none ex- 
hibited an indefinite population-doubling potential. 
Detection of Trisomy 7 in Nonmalignant Bronchial Epithe- 
lium. Background rates of trisomy 7 were determined by ex- 
amining normal human bronchial epithelial cell lines derived 
from autopsy cases of never smokers and bronchial epithelium 
collected~froln _ hever~sihoKel^uri 

chial cell lines (passage 2) from four donors and bronchial 
epithelial cell samples obtained by bronchial brushing from the 
recruited never smokers (Table 3), only 1.4 ± 0.3% (SD) of the 
cells contained three hybridization signals for chromosome 7 
with values ranging from 1 to 1.8%. These values agree with 
those reported by the manufacturer of the probe. Therefore, 
trisomy 7 frequencies of >2.0% (>2 SD above the mean for 
controls) were considered significantly different from controls. 

Passage 1 or 2 bronchial cells from lung cancer patients 
were examined for trisomy 7. Eighteen of the 42 bronchial sites 
(43%) sampled from the 12 lung cancer patients contained 
trisomy 7 at frequencies ranging from 2.3 to 6.0% (Table 1; Fig. 
1). Three subjects (cases 1, 2, and 1 1) displayed trisomy 7 in all 
sites collected during bronchoscopy, and in two subjects (cases 
7 and 12), trisomy 7 was found in three of four sites (Table 1). 
Six of the 18 sites positive for trisomy 7 also contained cyto- 
logically abnormal cells. Trisomy 7 was found in six of seven 
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Frequency of trisomy 7 in bronchial epithelial cells from lung cancer 
portents 



Case Age 


Smoking 
(pack-yrs) 


Tumor 
diagnosis 


Brush 
location 


Cytological 
diagnosis 


Trisomy 7 
(frequency, ft) 


1 64 


104 


SCC 


RLL 0 


N 


2.8 








RUL 


AGC 


4.0" 








RLL r 


M 


3.0* 








RUL r 


N 


4.0° 








I.I f 


•N 


6.0* 








LUL' 


SM 


4.3* 


2 69 


26 


SCC 


RUL 


SM 


2.8* 








LLL 


SM 


3.3* 








LUL 


N 


3.8* 


3 65 


120 


SCC 


RLL 


AGC 


2.0 


- 






RUL 


AGC 


2.3* 








1XL 


AGC 


2.0 , 


4 52 


90 


AC 


RLL 


SM 


1.5 








RUL 


N 


1.8 








LLL 


SM 


1.5 








LUL 


SM 


1.8 


5 70 


50 


SCC 


RLL 


N 


1.5 








RUL 


N 


1.5 








IT J 


N 


1.5 








LUL 


SM 


1-3 


6 61 


93 


AC 


RLL 


N 


1.5 








RUL 


N 


1.3 








in 


N 


2.0 








LUL 


N • 


1.5 


7 58 


40 


SCC 


RLL 


N 


1.8 








RUL 


N 


2.3* 








LLL 


N 


2.5* 








LUL 


N 


2.8* 


8 59 


120 


AdSCC 


RLL 


N 


1.5 








RUL 


N 


2.0 








LLL 


N 


2.5" 








LUL 


AGC 


2.0 


9 65 


71 


SCC 


KJJ~ 


SM 


2.0 








RUL 


SM 


2.5* 


10 63 


45 


AC 


KLL 


N 


1.0 








RUL 


N 


1.8 








LLL 


N 


1.8 








LUL 


N 


1.3 


11 61 


95 


AC 


LLL 


N 


. 2.5* 








LUL 


N 


2.8* 


12 76 


17 


SCC 


RLL 


N 


2.0 








RUL 


N 


2.3* 








I I I. 


N 


2.3* 








LUL 


N 


2.3* 


" Rl.U right lower lobe; RUL, rigbl upper lobe; LLL, left lower lobe; LUL, left 



upper lobe; AGC, atypical glandular cells; SM, squamous metaplasia; N, normal 
cells; AdSCC, adenosquamous carcinoma. 
"*>"<~0^05 as comparc^ToTicvCT-Ymoker controls; 
c Resampled 4 months later. 



patients diagnosed wilh SCC, whereas only one of four patients 
with AC displayed trisomy 7 in any site collected at bronchos- 
copy. Case 7, which had histological features of both SCC and 
AC, had one site positive for trisomy 7. The frequency of 
positive trisomy 7 sites in all patients with SCC within this 
small sample population was significantly greater than in AC 
patients {P < 0.005); 

The reproducibility of detecting trisomy 7 at sites found to 
be positive for this abnormality was investigated in one patient 
(case 1) who required repeat bronchoscopy for clinical reasons. 
Trisomy 7 was increased similarly in the two sites brushed 
during both procedures, although cytological examination 
showed atypical cells in one site from the first bronchoscopy 
and cytologically normal cells from the same site collected 
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Table 



frequency of trisomy 7 in bronchia} epithelial cells from cancer-free 
smokers and former uranium miners 



Case Age 


Smoking 
(pack-yrs) 


Radon exposure 
(WLMsr 


Brash 
location 


Cytological 
diagnosis 


Trisomy 7 
(frequency, %) 


13 81 


15 


0 


RLL 


ff 


1.8 








RUT. 


AOC 


1.5 








LLL 


N 


1.8 








LUl. 


SM 


2.0 


14 34 


24 


0 


RLL 


N 


1.3 








RUL 


N 


1J 








LLL 


N 


1.0 








LUL 


N 


1.3 


IS 68 


51 


0 


RLL 


N 


4.0* 








RUL 


N 


3.0* 








LLL 


N 


4.3* 








LUL 


N 


3.5* 


16 45 


30 


0 


RLL 


N 


1.3 








RUL 


N 


1.5 








tti 


N 


2.0 








LUL 


N 


1.8 


17 59 


8 


27 


LLL 


N 


3.0* 








LUL 


N 


3.0* 


18 65 


9 


5)6 


LUL 




1.3 








RUL 


N 


jj 


10 64 


30 


235 


LUL 


N 


1.5 








RLL 


N 


1.0 


20 56 


0 


186 


LUL 


N 


2.0 








RLL 


N 


23" 


21 64 


0 


214 


RLL 


H 


1.8 


22 64 


9 


577 


LUL 


N 


i.S 








RLL 


H 


0.8 


23 67 


3! 


124 


LLL 


H 


i.3 








LUL 


H 


2.R* 








RLL 


H 










RUL 


H 


:u* 



" Abbreviations are as indicated in Ta!>le 1 foouiotc. WLM. working level month: 
H. hyperplasia. 

'' P < 0.05 as compared to never-smokcr controls. 



during the second procedure (Table 1). The other two sites 
collected during the second bronchoscopy also showed elevated 
frequencies of trisomy 7 in this patient 

Trisomy 7 was detected in cytologically normal bronchial 
epithelium collected from four sites in one (case 15) of the 
cancer-frcc smokers (Table 2). Bronchial cells from the other 
smokers did not contain this chromosome abnormality. In the 
former uranium miners (cases 17-23), seven of 15 sites col- 
lected during bronchoscopy were positive for trisomy 7. Three 
_ ^fjtoeposjtwejjjtes^^ 
contained basal cell hyperplasia. However, the other four sam- 
ples positive for trisomy 7 showed no cytological abnormality. 

Two of the former uranium miners (cases 18 and 23) 
developed lung cancer within 2 years of bronchial cell collec- 
tion. SCC was diagnosed in the rightupper lobe of both sub- 
jects. As noted in Table 2, both cases were positive for trisomy 
7 in the right upper lobe brushing site obtained at the initial 
bronchoscopy. 

Tissue Culture Effects on Trisomy 7 Expression in Bron- 
chial Epithelium. The effect of tissue culture on trisomy 7 
frequency was assessed by comparing the frequency of this 
chromosome abnormality in freshly isolated bronchial epithe- 
lium obtained directly from bronchial brushes ("preculture") to 
passage 1 cells. This comparison was conducted on cells col- 
lected from two different bronchial sites in three different 
subjects [(cases 11 and 16 and donor 7 (never smoker)]. Cul- 
tured samples positive for trisomy 7 in case 11 were also 



Table 3 Interphase analysis of chromosome 7 in normal human bronchiai~ 
epithelial cells 

Bronchial epithelial cell lines were established from never smokers (Clonetksi 
after autopsy and from volunteers. The normal distribution of chromosome 7 cw,i 
number as detected hy K1SH is shown by the percentage of ceils exhibitum \{ 
3. or 4 hybridization signals. Four hundred cells containing hybridization sjjmai 
were counted per donor. * * ' v 



I 



Donor 


Age 


Brush 
location 


1 


Number of hybridization 
signals/cell (%) 

2 3 


4 




6 


NA° 


3.5 


92.0 


1.5 


3.0 


2 


17 


NA 


2.3 


95.5 


1.3 


1.0 


3 


15 


NA 


1.5 


94.7 


1.8 


2.0 


4 


41 


NA 


2.0 


94.8 


1.0 


2.3 


5 


45 


RLL 


1.0 


955 


1.8 


1.7 






RUL 


0.5 


983 


1.0 


0.2 






LLL 


13 


96.5 


1.0 


1.2 






LUL 


1.0 


963 


1.2 


1.5 


6 


35 


RLL 


1.0 


96.8 


1.0 


1.2 






RUL 


2.5 


933 


1.7 


2.5 






LLL 


2.0 


94.8 


13 


1.7 






LUL 


1.8 


94.2 


!.8 


22 


7 


33 


RLL 


05 


98.2 


0.8 


0.5 






RUL 


0.5 


972 


1.3 


1.0 






LLL 


1.2 


96.8 


1.3 


0.7 






LUL 


1.0 


96.0 


13 


1.5 



" Abbreviations are as indicated in tbe legend to Table I. NA. not applicable. 



positive in preculture cells from the same bronchial collection 
site, whereas sites negative for trisomy 7 in cultured ceils from 
case 16 and the never smoker were also negative in precuiiurc 
cells (data not shown). Values for trisomy 7 differed by <0.3% 
between preculture and cultured cells. The effect of passaging 
cells on the frequency of trisomy 7 was also examined in 
bronchial cells from case 1. Trisomy 7 frequency was similar in 
cells from passages 1, 4, and 7. 

Frequency of Trisomy 2 in Nonmalignant Bronchial Epi- 
thelium. Aneuploidy has been detected in bronchia] squamous 
metaplasia, a likely precursor to SCC (26). To determine 
whether the increased frequency of trisomy 7 detected in the 
current study reflects generalized aneuploidy or a specific chror 
mosomal duplication, a subgroup of samples was evaluated for 
trisomy of chromosome 2. The frequency of trisomy 2 in never 
smokers was 1.5 ± 0.4% (data not shown). Bronchial cells 
from eight subjects, six of whom had elevated frequencies for 
trisomy 7, were evaluated. The frequenc y for trisom y of chro- 
mosome 2 did not differ from never smokers (Table 4). 



Discussion 

The studies described in this report are the first to detect and 
quantify an increase in trisomy 7 in the airway cells of subjects 
at risk for lung cancer. The presence of trisomy 7 appeared to 
be a specific chromosome gain and not due to generalized 
aneuploidy in these cells. In addition, trisomy 7 in nonmalig- 
nant epithelium from lung cancer patients was associated with 
SCC tumor histology, suggesting that patients with this genetic 
change may be at greater risk for developing SCC lhan ether 
histological forms of lung cancer. This supposition was sup- 
ported by the fact that two cancer-free former uranium miners 
with bronchial cells positive for trisomy 7 ultimately developed 
SCC. Finally, these results demonstrate the ability to detect 
genetic changes in cytologically normal cells, suggesting that 
molecular analyses may enhance the power for detecting 
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Fig. 1. FISH for chromosome 7 in 
bronchial epilhelial cells. Trisomy 7 
is apparent in one cell from this, 
field. Magnification. XS30. 



Table 4 frequency of trisomy 2 in branchial epilhdiai cells from lung cancer 
patients, cancer-free smokers, a nil fonner uranium miners 



(iise 



Tumor diagnosis 



Brush location 



Trisomy 2 
(frequency, %) 



sec 



sec 
sec 



13 



None 



15 



19 



None 

None 
None 



RLL° 
RUl. 
LU. 
LUL 
LU. 
LUL 
RU. 
RUL 
LLI. 
LUL 
R1.L 
RUL 
Ll.L 
RLL 
RUL 
ILL 
- LUL - 
RLL 
RUL 
LLL 
I.UL 
LUL 
RLL 
RL1. 



1.5 
1.8 
1.8 
1.0 
1.0 
1.0 
1.5 
2.1 
1.8 
1.5 
0.3 
1.5 
0.8 
1.0 
0.8 
1.0 

1.8 
2.0 
1.0 
1.3 
1.9 
0.8 
1.5 



' Abbreviations are as indicated in legend to Table I . 



premalignaru changes in bronchial epithelium in high-risk 
individuals. 

Cigarette smoking and the exposure of underground min- 
ers to radon progeny arc both well-established respiratory car- 
cinogens (27, 28). Tobacco smoke contains numerous muta- 
gens and carcinogens, and radon progeny that have been 
inhaled and deposited on the respiratory epithelium release a 



particles capable of damaging DNA (28). Although comparison 
between findings in the cigarette smokers and the former ura- 
nium miners is constrained by the number of participants in the 
two groups, trisomy 7 was found in bolh groups. These results 
are consistent with the synergism between smoking and radon 
progeny, which suggests commonality in the pathways by 
which the two carcinogens cause lung cancer (29). 

The bronchial brushing" method used for collecting cells 
from the lower respiratory tract is rapid {10-12 min total for 
two brushes at four different sites), well tolerated by the patient, 
and permits collection of viable bronchial cells that can be 
expanded through tissue culture at 100% efficiency. The sta- 
bility of these cells in culture was evident by the fact that the 
frequency of trisomy 7 did not differ between primary brush 
cells and cells propagated for up to seven passages. Further- 
more, this procedure is amenable to the production of sufficient 
cell numbers (1 X 10 s ) at low passage (one or two) to accom- 
modate multiple molecular analyses. Although the media used 
in culturing of bronchial epithelial cells did not appear to 
^provYde T1seleWve~glbvrt^^ 

additional chromosome 7, the modulation of medium supple- 
ments might lead to the establishment of clonal populations of 
premalignant cells. Such cell populations would greatly facil- 
itate the identification of additional early gene changes in 
respiratory carcinogenesis. 

The detection of trisomy 7 in multiple nonmalignant sites 
within the bronchial tree supports the theory of field cancer- 
ization (5), which stales that diffuse exposure of the entire 
respiratory tract to inhaled carcinogens causes the development 
of multiple, independently initiated sites lhal can lead to tumor 
development. Although the frequency of this chromosome ab- 
normality was relatively low (2.3-6.0%), these values were 
consistent with the low percentage of cells within each brush 
sample (10%) that exhibited abnormal cytology. These results 
are also similar to studies of chromosome gain in patients with 
head and neck cancer where trisomy 7 was detected at frequen- 
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cies of 2, 3, and 2 1% ia histologically normal, hyperplastic, and 
dysplastic cells, respectively (30). 

The detection of trisomy 7 in normal, hyperplastic, and 
metaplastic bronchial epithelium from cancer-free patients ex- 
tends a recent report describing LOH at chromosomes 3p, 5q, 
and 9p in dysplastic premalignant bronchial lesions harvested 
from current and former smokers by bronchoscopy (31). The 
inability to detect LOH at these chromosome loci in normal or 
early premalignant epithelium may stem from a difference in 
sensitivity between the methodologies used. The low frequency 
of trisomy 7 and cytologically abnormal cells collected from 
bronchoscopy is consistent with a lac^c of clonality within the 
brush cells. FISH assays on interphase cells permit screening of 
individual cells, and sensitivity for detection is limited only by 
the number of cells examined. In contrast, microsatellite anal- 
yses for LOH cannot detect nonclonal changes but require that 
the chromosome alteration be present in approximately 40- 
50% of the sample (32, 33). 

The role of trisomy 7 in lung cancer development has not 
been elucidated. Increased expression of EGFR, which is lo- 
cated on chromosome 7 (34), is observed in 50-80% of 
NSCLCs (16, 35, 36). EGFR expression appears greater in SCC 
than AC (35, 36) and is amplified in some cell lines derived 
from SCC (37). These findings corroborate our hypothesis that 
acquisition of trisomy 7 in bronchial epithelium could be prog- 
nostic for development of SCC. Moreover, expression of this 
gene is also increased in nonmalignant bronchial epithelium 
from NSCLC patients (16, 35) and in normal or premalignant 
epithelium adjacent to head and neck tumors (38). Thus, altered 
expression of EGFR could enable cells that have acquired 
additional genetic changes to proliferate continually and escape 
from terminal differentiation (39). In addition, the c-met onco- 
gene is also located on chromosome 7 and is overexpressed in 
NSCLCs (40, 41). This oncogene encodes a transmembrane 
tyrosine kinase (42) thai functions as a receptor for the hepa- 
tocyte growth factor (43) and is involved in sustaining the 
growth of NSCLC cells in culture (44). 

Previous studies have detected mutations in p53 (12, 14, 
35), chromosome losses at 9p21 (45) and 3p (46) in preinvasive 
bronchia] lesions, and simple chromosome rearrangements in 
normal bronchial epithelium from proximal airways (47) of 
lung cancer patients. The prevalence of these genetic changes in 
normal epithelium from persons at risk for lung cancer should 
be quantified by FISH to define the temporal sequences of 
somatic genetic changes that precede the development of clonal 
lesions in the lung. This information will be invaluable in 
providing biological markers that can qualitatively estimate the 
-extentof field-cancerization-in-persons-at-risk-for-lung-cancer — 
and can be used to assess the efficacy of chcmointervention 
trials. Ultimately, the efficiency for detecting these biological 
markers in bronchial epithelium versus exfoliated epithelial 
cells within sputum must be established to support the use of a 
"genetic-based" screening approach for individuals at high risk 
for lung cancer. The results of the current investigation have 
identified one potential biomarker, trisomy 7, that may be 
useful in early detection and intervention for lung carcino- 
genesis. 
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From genes to protein structure and function: 

I- J " t 

approaches in the genomic era 

Jeffrey Skolnick and Jacquelyn S. Fetrow 

The genome-sequencing projects are providing a detailed 'parts list' of life. A key to comprehending this list is understanding 
the function of each gene and each protein at various levels. Sequence-based methods for function prediction are inadequate 
because of the multifunctional nature of proteins. However, just knowing the structure of the protein is also insufficient for 
prediction of multiple functional sites. Structural descriptors for protein functional sites are crucial for unlocking the secrets 
in both the sequence and structural-genomics projects. 



Genoms-sequencing projects are providing a 
detailed 'parts list* for life. Unfortunately, this list, 
a portion of which represents the amino acid 
sequence of all the proteins in a given genome, does 
not come with an instruction, manual. That is, given 
the genome 's sequences, one does not necessarily know 
straight away which regions encode proteins, which 
serve a regulatory role and which are responsible for 
the structure and replication of the DNA itself. 

This is not unlike giving a child a list of parts nec- 
essary to create a working automobile. Without the 
necessary expertise" creating the final, working car from 
just the initial parts list is a nearly impossible task. Simi- 
larly, understanding how to create a complete, func- 
tioning cell given just the sequence of nucleotides 
found in an organism's genome is a complex problem. 

What is a protein function? 

After a genome is sequenced and its complete parts 



Obviously^-the complete characterization of protein 
function is dinicult but efforts are under way at all levels 1-1 , 
including cellular function 5 - 6 . In' this article, however, 
we focus on identifying the biochemical function of a 
protein given its sequence, a problem that is amenable to 
molecular approaches. 

Sequence-based approaches to function 
prediction 

The sequence -to- function, approach is the most com- 
monly used function-prediction method. This robust 
field is well developed and, in the interest of space 
limitations, we will merely present a brief overview. 

There are two main flavors of this approach: sequence 
alignment 7-9 ; and sequence-motif methods such as 
Prosite 10 , Blocks 11 , Prints 12 - 15 and Emotif". Both the 
alignment and the motif methods are powerful but a 
recent analysis has demonstrated their significant limi- 
tations' 5 , suggesting that these methods will increasingly 



fot determined, the nexrgo^Hstb understand the func— ^fajl^ a s the protein-sequence databases become- more 
tion(s) of each part, including out of die proteins. What 
do we mean by protein function, the focus of this article? 

Function has many meanings. At one level, the pro- 
tein could be a globular protein, such as an enzyme, 
-honnone or andbody,-or~it-could-bc-a-structural_or__ 
membrane-bound protein. Another level Is its bio- 
chemical function, such as the chemical reaction and 
the substrate specificity of an enzyme. The regulatory 
molecules or cofactors that bind to a protein are also 
levels of biochemical function. 

At the cellular level, the protein's function would 
involve its interaction with other nucromolecules and 
the function and cellular location of such complexes. 
There is also the protein s physiological function; diat 
is, in which metabolic pathway the protein is involved 
or what physiological role it perforins in the organism. 
Finally, die phenotypic function is the role played by 
the protein in the total organism, which is observed by 
deleting or mutating the gene encoding the protein. 
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iverse. 

An extension of these approaches that combines 
protein-sequence widi structural information has been 
developed and some successes have been reported 1 ''. 
However, this method sti ll applies th e structural infor- 
mation in a one-dimensional, 'sequence-like' fashion 
and fails to take into account the powerful three- 
dimensional information displayed by protein structures. 

In addition, proteins can gain and lose function dur- 
ing evolution and may, indeed, have multiple function; 
in the cell (Box 1). Sequence-to-function method; 
cannot speciGcally identify these complexities. Inaccu 
rate use of sequence-to-function methods has led t< 
significant function-annotation errors in the sequenc 
databases 17 . 

An alternative approach 

An alternative, complementary approach to proteir 
function prediction uses the scquence-to-«ructurc-t< 
function paradigm. Here, die goal is to determine tr 
structure of the protein of interest and then to identi 
the functionally important residues in that structur 
Using the chemical structure itselfto identify rurictior 
sites is more in line \vidi how the protein actually wori 



lii a sense, this is one long-term goal of 'structural 
genomics' projects ,M \ which are designed to deter- 
mine all possible protein folds experimentally, just 
as genome-sequencing projects are determining all 

structural-biology approaches, in which one knows the 
protein 's function first and only then, if the function is 
sufficiently important, determines its structure. 

It is implicitly assumed that having die protein s struc- 
ture will provide insights into its function, thereby fur- 
thering the goals of the human-genome-sequencing 
project. However, knowing a protein 's three-dimensional 
structure is insufficient to determine its function 
(Box 2). What we really need to analyse and predict the 
multifunctional aspects of proteins is a method spe- 
cifically to recognize active sites and binding regions in 
these protein structures. 



Active-site identification 

In order to use a structure-based approach to function 
prsdiction, one must identify the key .residues. r^spon^ 
sible for a given biochemical activity. For many years, 
it has been suggested that the active sites in proteins are 
better conserved than die overall fold. Taken to the 
limit, this suggests that one could not only identify dis- 
tant ancestors widi die same global fold and die same 
activity but also proteins with similar functions but 
distantly related, or possibly unrelated, global folds. 

The validity of this suggestion was demonstrated 
empirically by Nussinov and co-workers, who showed 
that the active sites of eukaryotic serine proteases, sub- 
tilisins and sulfhydryl proteases exhibit similar structural 
motifs 21 . Furthermore, in a recent modeling study of 
Saccharomyces ccrevisiae proteins, protein functional sites 
were found to be more conserved than odicr parts of 
the protein models 22 . Similariy, it has been demon- 
strated that die catalytic triad of the ct/p hydrolases 
is structurally better conserved than other histidine- 
containing triads 23 . A comparison of the structure of the 
hydrolase catalytic triad to odicr histidinc-containing 
triads shows a disdr.ct bimodaLdlstribuGon. while a 
firrnlar analysis done wid; * randomly selected triad shows 
a unimodal distribution (Fig. 1). 

Kasuya and Thornton 24 generalized this example by 
creating structural analogs of a few Prositc sequence 
motifs' 0 . For the 20 most-frequendy occurring Prosite 
patterns, the associated local structure Is quite distinct 
These results provide dear evidence that enzyme active 
sites are indeed more higldy conserved than other parts 
of the protein. 

Identifying active sites in experimental structures 

Historically, several groups have attempted to iden- 
tify functional sites in proteins; these efforts were 
directed at protein engineering or building functional 
sites in places where they did not previously exist. This 
has been successfully accomplished for several metal- 
binding sites 2 *" 15 . However, highly accurate functional- 
rite descriptors of the backbone and side-chain atoms were 
required, fueling die belief that significant atomic detail 
is required in rite descriptors foe function identification. 

Highly detailed residue side-chain descriptors of the 
active sites of serine proteases and related proteins have 
been used to identify functional sites 5 . The use of these 
highly detailed motifs has led to die identification of 



Box i. Proteins are multifunctional 



A common protein characteristic that makes functional analysis based 
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strate and zinc, and performs a redox reaction. Each of these occurs 
a' different functional sites that are in dose proximity and the corroi- 
nation of all four sites creates the fully functional protein. 

Other examples of multifunctional proteins are the nudeic-ackJ-binding 
proteins. For instance, DNA regulatory proteins often contain a 0NA- 
binding domain, a multimerization domain and additional sites that bind 
regulatory proteins; a classic example is RecA M . The 3C rhinovirus 
protease exhibits a proteolytic function as well as an RNA-binding 
(unction* 0 - 61 . Transcription factors are also complex, multifunctional 
proteins 62 . It is becoming increasingly important to recognize each of 
these different functions of gene products of a newly sequenced gene. 

The serine-threonine-phosphatase superfamily is a prime example o( 
the difficulties of using standard sequence analysis to recognize the 
multiple functions found in single proteins. This large protein family is 
divided into a number of subfamilies, all of which contain an essential 
phosphatase active site. Subfamilies 1, 2A and 2B exhfoft 40% or more 
sequence identity between them 63 . However, each oflbesejSJJbfarnilies 
is apparently regulated differently ki the cell 6 *- 67 and observation sug- 
gests that there are different functional' sites at which regulation can 
occur. Because the sequence identity between subfamJies is so high, 
standard sequence-similarity methods could easily misdassify new 
sequences as members of the wrong subfamily if the functional sites 
are not carefully considered, as was recently demonstrated 43 . 

These are but a few examples of the multifunctionatity of proteins. 
The recognition of this multifunctional nature is of critical importance 
to the genomics field. Useful functional-annotation methods must con- 
sider all of the specific functions in a given protein and will not just 
provide a general classification of function. 



several novel functional sites in known, high-quality 
protein structures 3 -". More automated methods for 
finding spatial motifs in protein structures have also 
been described 21 - 54- * 0 . 

Unfortunately, most of these methods require the 
exact placement of atoms within protein backbones and 
side chains, and so have not been shown to be relevant 
to inexact predicted structures. Rccendy, however, we 
described mepee^efcon^f fuzay,- bexact descriptors 
of protein functional sites". As we wish to apply the 
descriptors to experimental structures as well as to pre- 
dicted protein models, we used only carbon atoms and 
side-chain centers-of-mass positions. We call these 
descriptors 'fuzzy functional- forms- (FFFs) -and have 
created them for both the disulfide-oxidorcductase ls '* 1 
and a/B-hydrolase catalytic active sites 2 -'*. 

The disulfide-oxidoreductase FFF was applied to 
screen high-resolution structures from the Brookhaven 
protein database* 2 . In a dataset of 364 protein structures, 
the FFF accurately identified all proteins known to 
exhibit the disulfide-oxidoreductase active site 1 *. In a 
larger dataset of 1501 proteins, the FFF again accurately 
identified all proteins with the active site. In addition, 
it identified another protein, Ifjm, a serine-duxonine 
phosphatase. This result was initially discouraging but 
subsequent sequence alignment and clustering analysis 
strongly suggested diat diis putative site might indeed 
be a site of redox regulation in the serine— threonine 
phosplutase-l subfamily 13 . If confirmed _by experiment, 
this result will highlight the advantages of using struc- 
tural descriptors to analyse multiple functional sites in 
proteins. It will also highliglit the (act that human 



Box 2. Knowing a protein's structure does not necessarily 
ted you its function 
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thing about its function. The most well-studied example is the (a/p) 8 
barrel enzymes, of which triose-phosphate isomerase (TIM) is the arche- 
typal representative. Members of this family have similar overall struc- 
tures but different functions, including different active sites, substrate 
specificities and cofactor requirements 70 71 . 

Is this example common? Our own analysis of the 1997 SCOP data- 
base 68 shows that the five largest fold families are the ferredoxin- 
like, the (a/p) barrels, the knottins, the immunoglobufin-like and the 
flavodoxin'.ike fold families with 22, 18. 1 3, 9 and 9 subfamilies, respec- 
tively (Fig. 0. In fact, 57 of the SCOP fold famines consist of multiple 
superfamilies. These data only show the tip of the iceberg, because 
each superfamily is further composed of protein families, and each indi- 
vidual family can have radically different functions. For example, the 
ferredoxinfike superfamily contains families identified as Fe-S ferredoxins, 
ribosomal proteins, DMA-binding proteins and phosphatases, among 
others. 

After this article was submitted, a muclvmcre-detailed analysis of the 
SCOP database was published 72 . This finds a broad function-structure 
correlation for some structural classes, but also finds a number of" 
ubiquitous functions and structures that occur across a number of fam- 
ilies. The article provides a useful analysis of the confidence with which 
structure and function can be correlated 72 . Knowing the protein struc- 
ture by itself is insufficient to annotate a number of functional classes 
and is also insufficient for annotating the specific details of protein 
function. 
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Figure I — 

histogram of the numbers of superfamilies found in each SCOP fold family. 
These data clearly show that proteins with similar structures can have different 
functions and demonstrate the difficulty of assigning protein function based 
simply on the three-dimensional structure. The data were taken from the 1997 
distribution of SCOP (http//scop jnrc-lmb.cam.ac.uk/scop). For a more-detailed 
analysis, see Ref. 72. 



observation alone is no longer adequate for identifying 
all functional sites in known protein structures. 

To date, the use of structure to identify fiuiction has 
largely focused on high-resolution structures and highly 
detailed descriptors of protein functional sites. How- 
ever, the creation of inexact descriptors for functional 
sites opens the way to the application of diese mediods 
to inexact, predicted protein models. The question 
remains: how good does a model have to be in order 
to us - FFFs to identify its active rites? 



The itate of the art in structure-prediction 
methods 

For proteins whose sequence identity is above ~30%, 
one can use homology modeling to build the struc- 

' .; 

lor jnoickiis thai aic not hoi.ioiogous to proteins with 
known structure. At present, dierc are two approaches for 
these sequences: ab initio folding 45-4 * and dircading 4 ' 1 " 53 . 

In ab initio folding, one starts from a random confor- 
mation and then attempts to assemble die native struc- 
ture. As this method does not rely on a library of 
pre-existing folds, it can be used to predict novel 
folds. The recent CASP3 protein-smicturc-pfediction 
experiment (http://PredictionCenter.Unl.gov/CASP3) 
involved the blind prediction of the structure of pro- 
teins whose actual structure was' about to be experi- 
mentally determined. These results indicate diat con- 
siderable progress has been made 40 '* 4 . For helical anil 
a/p proteins with less than 110 residues, structures 
were often predicted whose backbone root-mean- 
squarc deviation (RMSD) from^atiye, .ranged, from 
.4-7 A. Progress is being made with the (i proteins, too, 
although they remain problematic. Because ab initio 
methods can identify novel folds, these methods could 
be used to help to select sequences likely to yield novel 
folds in experimental structural-genonucs projects. 

Another approach to tertiary-structure prediction is 
threading. Here, for die sequence of interest, . one 
attempts to find the closest matching structure in a 
library of known . folds 5 - 55 . Threading is applicable to 
proteins of up to 5(K) residues or so and is much faster 
than ab initio approaches. However, threading cannot 
be used to obtain novel folds. 

Ab initio predicted models can be used for automatic 
protein-function prediction 

The results of die recent CASP3 competition sug- 
gest that current modeling methods can often (but not 
always) create inexact protein models. Are these struc- 
tures useful for identifying functional sites in proteins? 
Using the ab initio structure-prediction program 
- MONSSTER, die tertiary structure of a gjutarcdox'ui, 
lego, was predicted 56 . For the lowest-energy model, 
die overall backbone RMSD from die crystal structure 
was 5.7 A. 

To determine whether this inexact model couldve 
"used for function identification, the sets of correctly 
and incorrecdy folded sttuctures were screened widi 
the FFF for disulfide-oxidoreductase activity 15 . The 
FFF uniquely identified the active site in die corrccdy 
folded structure but not in the incorrecdy folded ones 
(Fig. 2). This is a proof-of-principle demonstration thai 
inexact models produced by ab initio prediction oi 
structure from sequence can be used for the subsequen 
prediction of biochemical function. Of course, improve 
menrs in the method have to be made before sucl 
predictions can be done on a routine basis. 



Use of predicted structures from threading in 
protein-function prediction 

At present, practical limitations preclude folding a 
entire genome of proteins using ab initio methods 5 
Threading is more appropriate for achieving the rcquisi 
high-throughput structure prediction. Thus, a stanc 
ard threading algorithm 58 lias been used to screen : 
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proteins in nine genomes for the disulfide-oxidoreductasc 
active sire described al ovc. 

First, sequences dut aligned with the structures ot 
known disulfide oxidoreductases were identified. Then, 

sit- residues and geometry. For diosc sequences 101 
which odier homologs were available, a sequence - 
coa<ervation profile was constructed 21 . If the putative 
active-site residues were not conserved in die sequence 
subfamily to which the protein belongs, that sequence 
was eliminated. Odierwise. the sequence is predicted 

to have the function. 

Using this sequence-to-stmcmre-to-function method, 
99% of the proteins in die nine genomes diat luve 
known disulfide-oxidoreductase activity have been 
found. From 10% to 30% more functional predictions 
arc made tlian bv alternative sequence-based approaches; 
similar results are seen for die ct/p hydrolases 1 - 1 . Sur- 
prisingly, in spite of the fact that threading algorithms 
have problems generating good sequence-to-structure 
alignments, active sites are often accurate-'/ aligned, 
even for very distant matches. This observation would 
agree with the above experimental results indicating 
that active sites are well conserved in protein structures. 

Importandy, the false-positive rate when using struc- 
tural information is much lower than diat found using 
sequence-based approaches, as demonstrated by a 
detailed comparison of the FFF structural approach and 
die Blocks sequence-motif approach (N. Siew ct al, 
unpublished): In this study, the sequences in eight 
genomes, including Bacillus subtilis, were analysed for 
disulfide-oxidoreductase function using the disulfide- 
oxidoreductase FFF, die diioredoxin Block 00194 and 
the glutaredoxin Block 00195. If we assume that diose 
sequences identified by both the FFF and Blocks 
are 'true positives', we find 13 such sequences in the 
B. subtilis genome. 

There is no experimental evidence validating all of 
these 'true positives' and so they are more accurately 
termed 'consensus positives'. In order to find diese 13 
•consensus positive' sequences, the FFfcbits seven false- 
positives. On the othe« hand. Blocks hits 23 false 
positives (Fig. 3). It was previously suggested diat the 
use of a functional requirement adds information to 
threading and reduces the number of false positives". 
These data, including the data shown in Fig. 3, validate 
this daim on a genome-wide basis. 

Of course, as no genome has had the function of all 
of its proteins experimentally annotated, it is imposs- 
ible to know how many other proteins with the speci- 
fied biochemical function were not properiy identified. 
This is a critical question for researchers attempting to 
predict protein function. Experimental confirmation 
will be needed to validate diis or any other mediod 
fully. This points out die need for closely coupling 
computational function-prediction algorithms with 
experiments. 

Weaknesses of using the jequence-to-structurc- 
to-function method of funcdon prediction 

Based on studies to date, the identification of enzy- 
matic activity requires a model in which die backbone 
pjviSD from native near the active sites is about 4-5 A. 
Predicted models arc better at describing die geometry 
in die core of the molecule dian in the loops and so 



(a) 



10 r 



■o 
<u 

CJ 

tn 
£> 
O 



n 
E 

Z 



(b) 



6 
4 
2 
0 

10 



L 





0.5 1 1.5 

Root-mean-square distribution 



8 8 

■o. 
o 

Z 6 

<u 

to 

° 4 



E 

2 





0.5 1 1-5 

Root-mean-square distribution 



tneftis in Bhto&notogy 



Figure 1 

The distribution of root-meatvsquare distributions (RMSO) between the hydrolase 
catalytic triad and all other histkiine-containing triads shows a bimodal distribution 
(a); by. contrast the RMSO between a ranoomly selected (noocatalyticl triad and all 
other histidinecontaining triads has a,unimodal distribution (b). The Ks-Ser-Asp 
catalytic trial n the proteh-1 gpl (Rp2 lipase} (a) and a random histidinecontaining 
triad from 4pga (glutamirase-asparagiriase) (b) were structurally aligned to all His- 
oonta'ming triads in a database of 1037 proteins*. Actual et/p-hydrokse active sites 
(aj and the 4pga site \b) are indicated by Wue bars; other tustkfine triads that arc 
rot active sites are indicated by red bars. None of the sites found by matching to the 
4pga were hydrolase active sites, hset graphs show the full (Sstrfeution. 

predicting the function of a protein whose active site is 
in loops may be a problem. Also, the method can cur- 
rendy only be applied to enzyme active sites; substrate- 
and ligand-binding sites haw not been identified using 
the inexact models. Techniques that will further refine 
inexact protein models will be quite useful in caking 
the protein analysis to the next step. 

Conclusions 

Although sequence-based approaches to protein- 
function prediction have proved to be very useful, alter- 
natives are needed to assign the biochemical function 
of the 30-50% of proteins whose function cannot be 
assigned by any current methods. One emerging 
approach involves die sequence -to-stxuetorc-to-funcaon 
paradigm. Such structures might be provided by *truc~ 
tural -genomics projects or by structure-prediction 
algorithms. Functional assignment is made by screen- 
ing the resulting structure against a library of structural 
descriptors for known active rites or binding regions. 
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Figure 2 

Application of trie disulfideoxidoreductase fuzzy functional form (FFF) to ab initio 
models of gkAaredoxin created by trie program MONSSTER shows that the FFF can 
distinguish between correctly folded and misfolded (or higher-energy) models. The FF 
is shown as two orange balls (representing cysteines) and a blue ball (represent- 
ing the proline). The protein models are shown as magenta wire models v.itl? active- 
site cysteines and proline shown as yellow and cyan balls, respectively. The FF clearly 
distinguishes the correct active site in the crystal structure of the gkitaredoxin lego 
and the correctly folded, lowest-energy model. The FFF does not match to the active 
sites of any of the higher energy, misfolded structures, four of which are shown here. 
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Figure 3 

Analysis of the Bacillus subfjtis genome using the thioredoxin Block 00194. The Blocks 
scaeXconxxAed'usirtg-the-pubiely-available'BUMPS'program) is plotted on the x axis - 
and the number of sequences found in each scoring bin is plotted on the y axis. Those 
sequences identified as 'consensus positives' [identified by both the fuzzy functional 
form (FFF) and the Block] are shown as red bars. One additional sequence found by 
the FFF, which is likely to be a true positive, is shown as a blue bar. All other 
sequences, putative false positives', are shown as yellow bars. Using the Blocks 
score at which all 13 of the 'consensus positives' are found, 23 false positives are 
also found. In its analysis of the B. subb'&s genome, the FFF identifies only seven false 
positives along with the same 13 'consensus positives' (data not shown). 



Detailed descriptors will only work ci the experi- 
mentally determined, high-quality structures. Ideally, 
however, the descriptors should work on bodi experi- 
mental structures and the cruder models provided by 
tertiary-structuxe-prediction algorithm*.. 

The advantages of such an approach arc mat one need 
not establish an evolutionary relationship in order to 
assign function, that more than one (unction can be 



assigned to a given protein (an issue of major impor- 
tance, 'because proteins are multifunctional (Box l)J 
and, ultimately, that having a structure can provide 
deeper insight into die biological mechanism of pro- 
tein function and reputation. The disadvantages art- dint 
on- ,;vJ:.lv i.i.f ;>i. u; .. \. ,,'.]■.., - 

tion can be assigned and tliat the approach is limited to 
those functions associated with proteins with at least 
one solved structure, so that a functional-site descriptor 
can be constructed. 

In this sense, structure-to-funcrion assignment can be 
thought of as 'functional threading' - find thf active- 
site match in a library of descriptors for knowji protein 
active sites. This is die first step in the long process of 
using structure to assign all levels of function, a goal 
diat is made increasingly important with the emergence 
of structural genomics. Based on the progress to date, 
it is apparent that structure will play an important role 
in the post-genoinic era ol biology. 
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ligh-throughput technologies im- 
press us almost every week with novel 
global results and big numbers. They of- 
ten rev.eal important general trends that 
are Impossible to realize with classical, 
.low-throughput experimental methods, 
yet (so far) they provide fewer insights 
into specific, molecular detail. Because 
of the amount of data involved, high- 
throughput technologies imply the use 
of bioinformatics methods that deal 
with information transformation, stor- 
age, and analysis. By necessity, most of 
these processes are automated. 

Partly because of the nature of cur- 
tent publication schemes, the accuracy 
and error margins of a given method are 
often only found in small print. It is ob- 
vious that each method has its limits 
and also that during data processing, 
some information will be lost or diluted. 
Because of the current need to integrate 
and add value to data, results from high- 
throughput experiments (if made pub- 
licly accessible) are often taken further 
bythlrd-rwrtytesearch-tiiat-relies on the 
quality of these date. Thus, I believe that 
public awareness of error margins for 
high-throughput experimental and 
computational methods should be in- 
_ .creased; Jhcjnoedibly valuable data ac- 
cumulating in various heterogeneous 
databases permit powerful analyses but 
should not be overinterpreted. In the 
following discussion, I will concentrate 
on limits in computational sequence 
analysis, which Is far from being perfect 
(Table 1), despite the fact that sequenc- 
ing Itself is highly automated and accu- 
rate, and despite the fact that sequence 
information is described in simple linear 
terms (using a four-letter alphabet). On 
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average, a 70% accuracy just to predict 
functional and structural features has to 
be considered a success (Table 1). 

Limitations in the Total Knowledge 



Base of f.ctein Function 

As these analysis methods are knowl- 
edge based, one of the reasons for the 
/inaccuracy is that the quality of data in 
/ public sequence databases is still insuffi- 
j cient (e.g., Bork and Bairoch 1996; Bha- 
\ tia et al. 1997; Pennisi 1999). This is par- 
ticularly true for data on protein func- 
tion. Protein function is loosely defined; 
cellular function is more than the very 
complicated network of individual mo- 
lecular interactions on which it is based 
(Bork et al. 1998). Furthermore, the se- 
mantics for functional features are not 
-always established. For instance, the 
notion of a "protein complex" not only 
depends heavily on detection and puri- 
fication methods— which, in turn, are 
constantly evolving— but also on envi- 
- ronraental conditions. Protein function 
is context dependent, and both molecu- 
lar and cellular aspects have to be con- 
sidered (for review, see Bork et al. 1998). 

To Illustrate some of this complex- 
ity, a good example is lactate dehydro- 
genase: This gene product can act both 
as a dehydrogenase and an eye lens 
structural protein, depending on its con- 
text (for review, see Piatigorsky and 
Wistow 1991). Even without the compli- 
cation of a second, unrelated role for the 
same gene product, do we knowenough 
about the function of lactate dehydroge- 
nase, one of the best-studied proteins? 
We know its biochemical pathway (at 
least in human and some model organ- 
isms), its different isoenzymes (in organ- 
isms) with different context-dependent 



properties, its regulation, and the orga- 
nization of its quaternary structure. 
However, we are probably still missing 
much information, even on crucial mo- 
lecular features: Are we sure about alter- 
native splice variants? Can we exclude 
: age^cfepenrSeht pbst-translational modi- 
fications in some tissues? Our knowl- 
edge is even more limited regarding 
higher order functions that involve con- 
centration, compartmental organiza- 
tion, dynamics, regulation, and perhaps 
even the impact of external environ- 
ment. Often, the available data give at 
best some reliable qualitative results on 
functional features but far from a com- 
plete understanding of functionality. 
Yet our ability to annotate genome se- 
quences and translate information 
therein relies heavily on the summaries 
of features attached. to each sequence in 
the respective public databases. 

Limitations of Gene Expression 
Data Extrapolations 
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As more high-throughput technologies 
follow, the data will become more com- 
plicated than sequences. Novel comple- 
mentary data types such as gene expres- 
sion arrays win generate more func- 
tional information; but conclusions > 
from these data are often stretched with 
regard to protein products. The expres- • 
sion of genes and their reciprocal pro- 
teins seems to correlate weakly, with a 
correlation coefficient of 0.48 (Anderson 
and Seilhammer 1997). Furthermore, re- 
cent studies (Hanke et al. 1999; Mironov 
et al 1999) show that alternative splic- 
ing might affect >30<Xi of the human 
genes, although measurements at the 
protein level have yet to confirm this. 
Finally, the number of known post- 
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Prediction of 



Human promoters 

Human regulatory RNA elements 
Human genes (only presence) 

Human SNPs by EST comparison 

Human alternative splicing 
Transmembranes (only presence) 

Signal peptides (only presence) 
GPI a.ncorc (incl cleavage site) 
" CoilecTcoil (only presence) 
Secondary structure (Three states) 
Buried or exposed residues 
Residue hydration 
Protein folds (in Mycoplasma) 

Homology (several methods) 

Functional features by homology 

Function association by context 
Cellular localization (two states) 
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40% of new DNA • 
70% of chromosome 22 

30% of all proteins with SNP 

50% of all splice sites 
99% of annotated test set 

100% of annotated test set 
1 00% of annotated test set 
90% of annOt*ted»e©i!ed soil- 
100% of 3D test set 
100% of 3D test set 
100% of 3D test set 
50% of Mycoplasma ORFs 

50% of 3D test set 

70% unicellular genomes 

10% high confidence in yeast 
100% of annotated test set 
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Uanslational modifications of gene 
products is increasing constantly, so 
ihat the complexity at the protein level 
is enormous. Each of these modifica- 
tions may change the function of the 
.respective 'gene products drastically. 
(The entire aspect of context-dependent 
gene regulation is excluded from current 
discussions as we are only beginning to 
understand the complex underlying ge- 
netic machinery. For example, promoter 
prediction in eukaryotes has a success of 
only -3S% (Table 1), and there are many 
other regulatory elements that we can- 
not predict at ill.) 



Limitations Created by 
Third-Party. Analyses 

Public releases of completely sequenced 
genomes exceed a rate of one per 
month, with thousands of function pre- 
dictions therein. Gene - annotation' via 
sequence database searches is already a 
routine job, but even here the error rate 
is considerable (Table 1). The lower limit 
of errors in current functional annota- 
tion of large-scale sequencing proiects is 
8% (Brenner 1999). As errors accu™lat e 
and propagate (Bork and Bairoch 1996; 
Bhatia et al 1997; Smith and Zhang 



1997- Bork and Koonin 1998; Pennisi 
1999) it becomes more difficult to infer 
conect-fuhotibn JronLthe^rmmy poss>- 
billties revealed by a database search. In- 
creasing these complications is the fact 
that computet programs often cannot 
■even retrieve the source of the stored in- 
formation (Doerks et al. 1998). 
Use of Complementary Information to 
Limit Errors in Function Prediction 
Some new information can be retrieved 
from completely sequenced glomes 
for example, function cart be predicted 
by exploitation of genomic context, 
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Based on the observation that interact- 
ing proteins in one organism sometimes 

have home-loss m oilier organism; fused 
together in a single gene. Marcotte et al. 
(1999a) predicted novel interactions for 
S0% of veast proteins using gene fusion 
Information. However, they noted an 
overlap with classical methods and an 
error rate of 82%. To see a signal they 
had to correct for domains present in 
many proteins (Marcotte et al. 1999a). 
By considering only orthologs with fis- 
sion and fusion: events (Enright el al. 
1999, Snel et al. 2000), the signal-to- 
noise ratio increases and the number of 
predictions drops dramatically (7% of 
Escherichia coli proteins; Enright et at. 
1999). With a particular question in 
mind, Does protein X hav.. infraction 
partners?, the generation of hypothec: 
is extremely useful; yet to provide a gen- 
eral overview of protein function, it is 
advisable to keep the errors small. Fur- 
ther information can be added later, 
which is easier than retracting stored in- 
formation. But how do we incorporate 
the information on error margins? Such 
estimates (sometimes not even the 
sources of the annotation) are not vis- 
ible in current databases that store the 
results of computational approaches. 

Taking the 70% Hurdle 

As noted above, most prediction 
schemes extrapolate from current 
knowle dge, and many bioinformatics 
methods "have'difficulty exceeding a 
70% prediction accuracy (numbers in 
Table I are often overestimates because 
the test sets used are usually not repre- 
sentative of all sequences). On one 
hand, current methods seem to capture 
important. features and explain general 
trends; on the other hand, 30% of the 
features are missing or predicted 
wrongly. This has to be kept in mind 



when processing.the results further. Also 
the 70% accuracy often attaches to 

methods that ocas wu;-, '.tiSavu* w,»>- ■ 
such as sequences; making estimates 
about the prediction of cellular features 
is much more difficult as one first has to 
agree on semantics (or ontology in a da- 
tabase sense) to describe complex pro- 
cesses in a comparable way. 

All of the above focuses on limita- 
tions in the computational prediction of 
qualitative features. There remains a 
long way to go until we are able to de- 
scribe molecular processes quantita- 
tively; current simulations of complex 
. systems are still very rough and simplis- 
. tic. However, there is still no doubt that 
sequence analysis is extremely powerful 
and that the generation of hypotheses 
derived by computational methods will 
be more and more often the first success- 
ful step in the design of experiments. If 
70% of such experiments were success- 
ful, the speed of scientific discoveries 
would grow exponentially. 

The publication costs of this article 
were defrayed in part by payment of 
page charges. This article must therefore 
be hereby marked "advertisement" in 
accordance with 18 USC section 1734 
solely to indicate this fact. 
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Protein 
annotation: 
detective work for 
function — 
prediction 



Computer analysis of genome sequences is 
currently . one of the essential steps for 
obtaining functional and structural infor- 
mation about the respective gene products. 
Database searches are used to transfer 
functional features from annotated proteins 
to the query sequences. With the increasing 
amount of dau, more and more software 
robots perform this task 1 . While robots are 
the only solution to cope with the flood of 
data, they are also dangerous because they 
can currently introduce and propagate mis- 

' annotations^. On the one hand, functional 
information is often only partially transferred 
(underprediction). For example, information, 
is not usually extracted for each functional 
unit (protein domain) JmJus taken from 
the one-line description ofme best data-" 
base match (so multifunctionality is rarely 
considered). On the other hand, overpre- 
dictions are common because the highest- 
scoring database protein does not necessarily 

. share the same or even similar functions. 



Definition and collection of 
uncharacterized protein families 

To avoid unnecessary propagation of 
poor annotation, we have collected puta- 
tive, poorly annotated proteins that are usu- 
ally labeled as 'hypothetical' or just as 'ORF 
(open reading frame). We operationally 
defined uncharacterized protein families 
(UPFs) to be families of proteins that: (1) 
contain members in at least three taxonomi- 
cally distinct (and phylogenetically 'distant') 
species; and (2) do not contain (to the best 
of our knowledge) biochemically charac- 
terized proteins. 

A collection and classification of these 
proteins should allow: (a) utilization of family 
information and thus a more detailed char- 
acterization; (b) simplification of update pro- 
cedures for the entire families if functional 
information becomes available for at least 



one member, and (c) a careful annotation 
of functional features that avoids the pitfalls 
described above; 

A.- i-.v .1;.".. t ; v- '.<■■ fvO'uvj.v.b..: 
projects progress, more and more of these 
UPFs emerge in sequence databases. We 
gave high priority to families that contain 
members in at least two of the three major 
kingdoms (archae. eubacteria. eukaryotesi. 
• The original family' definition was based 
on significant hits in the statistics provided by 
FASTA ( Ref. * i or gapped BLAST t Ref. 5 1. 

Annotation of UPFs in SWISS-PROT 
and PROSITE databases 

A serial number has been assigned to 
each UPF and to each of the corresponding 
SWISS-PROT (Ref. 6) entries. A SWISS-PROT 
document file lists all the current UPFs and 
their members in SWISS-PROT. This docu- 
ment is*availabj- onohe.WSZW (Ref. 7). In 
the majority of cases, PROSITE entries 8 have 
already been created to document the 
respective family. Whenever a member of a 
UPF family is biochemically characterized, 
that family ceases to be considered as a UPF 
and is deleted from the list. However, infor- 
mation is provided that allows its history to 
be traced. For example: 

Family: UPF0002 IDELETED] 
Taxonomic range: Eubacteria 
Comments: Now characterized as a 
family of pseudouridylate synthases 
(EC 4.2.1.70). 

Prototype: RSUA_ECOU (Accession No. 
P33918) 

PROSITE entry: PDOC00885 

Function prediction for the UPFs 

The annotation is handled rather con- 
servatively (see below) because functional 
overpredictions are most dangerous given 
the ■ ma ny_.p j)portunities for error propa- 
~^tibn in sequence database". Neverthev 
less, we intended to retrieve as many func- 
tional features as possible for each UPF 
using comparative analysis. Thus, each 
UPF was subjected to a variety of sequence 
analysis methods 9 . In brief, several mem- 
— befs^of "each"UPF"were compared with-a 



database of non-identical protein sequences, 
daily updated at the EMBL using PSl-BLAST 
(Ref. 5) with a conservative expected ratio 
of false positives (E = 0.001) as a threshold 
for each iteration. Sequences were pre- 
processed by filtering for transmembrane 10 
and coiled-coil regions 11 . A multiple align- 
ment was constructed for each UPF using 
CtustalX (Ref. 12), If PSl-BLAST did not iden- 
tify a relationship to characterized proteins, 
other iterative methods such as Wisetools 
(Ref. 13) and Most (Ref. 14) were applied. 
They also use family information, that is, 
give more weight to conserved positions 
and so on, but have the advantage that the 
underlying multiple alignments can be 
checked and improved manually (on the cost 
of speed and the 'easy to use' feature). 

Finally, all searches were repeated using 
a sequence database that only contained 



sequences from entireh'.sequenced genomes 
to reduce noise effects'". For example, 
PSl-BLAST E-values depend on the database 

.. Ji".' ■ ... ».„•■;:. riijvt siriifican! 
using a small dauljascbut becomes insignifi- 
cant if more background noise (unrelated 
or redundani sequences) is added. 

In many cases, the iterations revealed 
the relationship of the UPFs with other pro- 
teins, families or superfamilies. As the main 
focus here was to assign functional features, 
the aerations have not been continued when 
a reasonable prediaion could be made. 
Criteria for the latter were matches to known 
active site patterns or conserved motifs 
resembling those in PROSITE as well as the 
positioning of UPF members within phylo- 
genetic trees. Transmembrane regions were 
identified in 13 (22%) of the 58 UPFs. 
although functional predictions for these 
13 have not been made. Of the remaining 
45 UPFs, 25 could be related to proteins 
with annotated functional features (Table 1). 

Pitfalls in function assignments 

The predictions required careful inspec- 
tion of the functional annotations of the 
matched database proteins. To illustrate the 
difficulties, Table 2 shows the result of a 
Blast search for UPF0002 that includes quite 
a few proteins with annotations (in addition 
to the first hits that are labeled as 'hypotheti- 
cal'). Only one can give a clue about func- 
tional features; others are simply wrong, 
misleading or uninformative. 

Another typical assignment error is 
caused_by the sequence similarity of the 
query to a region that is. independent from 
the one that was the basis for the annotation. 
For example, the hypothetical protein HI0722 
(Accession No. P44842, ID: YIGZ_HAEIN), 
a member of the UPFO029 family, shows 
significant similarity to two proteins (Gen- 
"' SafUTentnes gil23l4657 and gil2688341) in 
HdicobacterpyioriinA Borvelia burgdorferi, 
respectively, which are wrongly annotated 
as proline dipeptidases (pepQ). The anno- 
tation is based on the N-terminal homology of 
these two proteins with the C-terminal re- 
— gion of proline dipeptidase (pepQ) (gil42358) — 
of E. coli, which does not harbor the catalytic 
activity of this enzyme. 

There were even examples in which 
homologs scored best in PSl-BLAST (Ref. 5) 
4at.did.nQt have the.same catalytic activity 
because active site residues of the charac- 
terized family were not conserved. How- 
ever, there were significantly lower scoring 
homologs with perfect matches of their 
(distinct) catalytic site residues to the query. 
For example, the UPFO046 family has clear 
amino acid similarity to proteases that are 
easily found by PSl-BLAST (Ref: 5) in the 
fourth iteration; yet, residues involved in 
metal-binding are only shared with a purple 
acid phosphatase family that is only picked 
up in the ninth iteration. The E-value of 
le-5 compared with proteases (E-value of 
5e-78) remain considerably higher in sub- 
sequence iterations. Such instances have 



• 



implications for current function prediction 
programs in- which the ftlrictJdh of the biest 
ha is transferred dearly, another gener- 
ation of methods is required that include 
checks for the presence of functionally 
tc.|:ortir!i fc>i".'o..i. 

Use of phylogenetic trees 

As most of the database proteins with 
functional annotations were only distantly 
related to members of the UPFs. transfer of 
functional information is extremely difficult 
and arbitrary. The majority of UPFs turned 
out to be related to enzymes, and based on 
the conservation of the active site residues 
one can assume that at least the basic cata- 
lytic mechanism remains the same. This, 
hem-ever, is of little predictive value as some 
.families, e.g. those with the a/B hydrolase 
fold collected in SCOP (Ref. 16) are huge and 
harbor numerous distinct catalytic activities, 
such as lipases, esterases, • dehalogeriases, 
peptidases, peroxidases and h/asesj-We-have -t 
therefore constructed phylogenetic trees of 
selected members of the UPFs and of 
related, but distinct families that have been 
identified during the analysis (Fig. 1). On 
' some occasions, the UPF members clearly 
clustered with proteins that all performed 
the same function (Fig. la), but in most of 
. the cases the UPFs were of equal distance 
to distinct enzymatic activities (Fig. lb), thus 
not allowing any detailed predictions. 

Although the studied protein families 
were bound to be difficult for function 
predictions because a considerable num- • 
ber of teams were unable to find functional 



Tabu l.PredieJ<^_fiwcUQmLfeatures fbr.35.UPFs 



UPF No. 


Family size* 


rreciicteci tunctJon 


02 


70 


Pseudoundylate synthase 


(J! 

07 


• -60 
15 


Mouv/Urr- ns:-.:r.is-.' . 
Cytidyltransferase b 


08 


30 


ATPase 


09 


•10 


GTPase 


10 


10 


Aldose. 1-epimerase 


11 


10 


Methyltransferase b 


12 


25 


Nitrilase 


17 


30 


Hydrolase 


19 


15 


Phosphate-binding protein (TIM BARREL) 


20 


40 


N6-adenine-specific methylase 


21 


50 


ATPase 


26 


30 


Two domain protein : iron/sulfur binding and 






arnidotransferase 


30 


10 


Amidotransferase 


31 


30 


Sugar kinase 


34 


20 


Pyrimidin-binding.oxidoreductase (TIM BARREL) 


35 


20 


Mutator mutt -protein (7,8-dihydro-8- 


-v .. 




*" oxo'guanineiriphosphatase) 


36 


70 


Hydrolase 


37 


10 


Oxydoreductase 


38 


35 


ATPase b 


42 


10 


ATPase 


46 


15 


Phosphatase 


49 


50 


N6-adenine-specific methylase 


53 


40 


CBS domain protein 


55 


10 


Glutaredoxin 



a The numbers of family members are approximate because of daily changes in 
databases and loose family definitions. 

b £. coli member also predicted by Koonin et aiP (UPF0007: nucleotidyltransferase). 
Abbreviation: UPFs, uncharacterized protein families. 



Tabled. Misleading annotations: PSI-BLAST results for the UPF0002 famlly^first iteration) 



Ranking Annotation 



Probability Commentary 



1 Gnl I PID I e332795 (Z98268) hypothetical protein (2e-75) 

MTCI125.33 (Mycobacterium tuberculosis]. . . 
4 Sp\P33^3\:S^^g0US]B^^&^m^ (le-67) 



5 GnllPIDlell85138(Z99112) alternative gene 0e-65) 

name: ylmL, similar to hypothetical proteins 
(Bacillus subtilis]... 

37 Spl Q12362 1 WB2_YEAST DR AP DEAMINASE (7e-50) 

>gi 11078332 1 pir I I S50972RIB2 protein - yeast 



40 



41 



(Saccharomyces cerevisiae) >gi 1 642221 (Z21618) 
DRAP deaminase [Saccharomyces cerevisiae) 
>gi 1 1419887 1 gnl I PID I e252279 (Z74808) ORF 
YOL066c (Saccharomyces cerevisiae)... 

Sjp I P33918 1 RSUA_ECOLM6S PSEUDOURID YLATE (2e-^8) 
516 SYNTHASE (16S PSEUDOURIDINE 516 
SYNTHASE) (URACIL HYDROLYASE) 

Sp I Q47417 1 YQCB_ERWCA EXOENZYME (7e-^8) 
REGULATION REGULON ORF1 >gi 1 628643 1 
pir I IS45107 hypothetical protein 1 - Erwinia 
carotovora >gi 1 496598 (X79474) ORF1 (Erwinia 
carotovorah . . 



SEHB is a gene-name (suppressor of the 
temperature-sensitivity of ftsbl mutation) 
and does, not give much functional insight 



The homology is not in the catalytic region 
— and does not hold for other deaminases — 



Function prediction based on this protein 



Misleading annotation, operon 
architecture is not conserved between 
species 



Annotations that hamper functional predictions illustrated by the example of the UPF0002 family. Based on the recent experi- 
mental characterization of pseudoundylate synthase 18 , this family has been deleted from the UPF list (see text). Nevertheless, the 
various, pardy contradictory annotations (bold) are extremely difficult to parse for automatic function prediction programs. 
For brevity, the PSI-BLAST results have beeri cut (. . .). 
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(b) 



— YNR3_RH0C 



■ YYC5.CAEEL 
— YM60_YEAST 



HYP.HAEIN • 



YACM_BASCU 
) 



KRT.HAEIN ■ 



KDSB_HAEIN 



GCAO_BACSU 



- YHET_ECOLI '; 

i 

- A23.DROME , 
_ HPS1 .HUMAN ! 
_ XYLF.PSEPU 

- TO0F_PSEPU 

- PRXC_PSEPY 



GLMU_ECOU 



- ESTE.PSEFL 
•- BPA2_STRAU 

BPAR.SYNY— 

- PIP.LACDE 



. PIP_LACHE 



Figure 1. (a) Phylogeneuc trees of seleaed members of UPF0007 mat indicate a likely 
function as UPF0007 members with cytidyltransferase activities (red) ^related 
uridilykransferases (blue) are more divergent Cpir database entry, p.rlg64 tf; p.r 
database entry, pirls49238). (b) No clear enzymauc activity can be predicted for U>reui/ 
members: They clearly have the hydrolase fold but have equal ^stance to , f*™**« 
(red), esterases (green), peptidases (blue) and other hydrolases Cp^J^J^g 
gill001804). The trees were calculated using CLUSTALX (.Kei. 



features therein, it is noteworthy that there 
was not a single case in which we were 
able to predict the precise mechanism and 
die substrate specificity, Nevertheless, the 
Mormation about an enzymatic activity and 
the likely reaction mechanisms of the 25 
OPFs should prove userujfor the analysis of g 
upcoming genome sequences. 

Annotation with the right level of 
precision helps in future projects 

In summary, we were able to provide 

scme-runctional-annoution jor more than 

700 of about 1300 proteins clustered in 25 
of the 58 distinct UPFs. Most of them are 
currently named 'hypothetical protein' so 
that their annotation adds enormous value 
to these sequences. For another 13 UPFs 
currently containing about 250 proteins, 
the presence of transmembrane regions 
was recorded. This annotation Is now being 
incotporated into PROSITE and SWISS-PRGT 
so thai these features can be assigned to 
newly sequenced genes as well. 

' The difficulties we faced in assigning 
functions by sequence similarity also indi- 
cate that many of the automatic predictions 
by most of the software robots are probably 
erroneous. Because of the current policies of 
most of the sequence databases, correction 
of annotations is very hard to realize. Thus, 
there should be a combined effort by the 
database teams, the authors of the current 
entries, and the community, to work towards 
a careful functional annotation of all the 
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http-y/wwwjdsevlcrxMm.'locale/tio ♦ http^/BTvw.clseiierjil/locatc/lto 

Editor Adrian Bird. 

Institute for Cell and Molecular Biology at the University of Edinburgh 



Protocols arc now featured In Technical Tips Online, in addition to pecr-rwlew-cd 
Technical Tips articles (novel applications or significant improvements on adsdng 
mctliods) Protocols incorporate oil the features that arc currently available In Technical 
Tips articles: comment facility; links to Medline abstracts; product information and so on. 



New Core Protocol articles published rectndy inTeclmlcalTlps Online Include: 

I 7. I' 

. Mitchell T.J. and Merely, B J.i(1998) Isoladon of RNA and analysis by 
' northern blotting and primer extension Technical Tips Online 
Chts jv.//w-9vw.elsevier.corn/lodate/no> P01236 
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[5] Alanine Scan Mutagenesis of Chemokines 
By Joseph Hesselcesser and RicHARD Horuk 

Introduction 

Chemokines are a family of chemotactic cytokines that play a critical 
role in the regulation and trafficking of immune cells. 1 Some chemokines 
have also been shown to be suppressive factors 2 of human immunodefi- 
ciency virus type-1 (HIV-1). There are currently three families of chemo- 
kines, grouped according to the arrangement of their invariant cysteines. 
The C-X-C branch in which the first two cysteines are separated by an 
intervening amino acid includes interIeukin-8 (IL-8), melanoma growth 
stimulating activity (MGSA), platelet factor 4 (PF4), and stromal cell 
derived factor-1 (SDF-1). 3 This group is further divided by the presence 
(IL-8, MGSA) or absence (SDF-1, PF4) of an E-L-R motif prior to the 
first N-terminal cysteine, The C-X-C chemokines are generally chemoat- 
tractors and activators of neutrophils, exceptions being PF4, which also 
attracts fibroblasts and monocytes and SDF-1, which is a chemoattractant 
for T cells. The C-C chemokines are chemoattractants and cellular 
activators for monocytes, basophils, eosinophils, and lymphocytes. This 
is the largest family of chemokines and includes the macrophage inflam- 
matory proteins MIP-la and MIP-l/J, RANTES, monocyte chemoat- 
tractant proteins MCP-1 to MCP-4, 1-3.09, and eotaxin. 3 The third sub- 
division of the chemokines is the C branch, which is characterized 
by its only member lymphotactin, in which only two cysteines are con- 
served. 4 

In addition to their normal role in immune cell function, chemokines 
play an important part in a number of autoimmune diseases such as multiple 
sclerosis and rheumatoid arthritis, 3 and chemokine receptors are gateways 
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of infection for HIV-1 and the malaria parasite. 5 " 8 Thus, it is essential to 
understand how chemokines interact with their cellular receptors to pro- 
duce their biological effects to develop therapeutic approaches of inter- 
vention. 



Rationale 

Structure-function relationships between ligands and their receptors 
can be examined in a number of ways. Site-directed mutagenesis, particu- 
larly alanine scan mutagenesis, has been a successful technique for identi- 
fying functionally important residues in proteins. This approach has been 
applied to a number of receptors and ligands as exemplified for human 
growth hormone (hGH), IL-8, MGSA, CXCR1 (IL-8RA), and CXCR2 (IL- 
8RB). 9-13 In a variation of alanine scan mutagenesis the charged residues of 
a molecule only are replaced, and thus residues forming ion pairs with 
charged residues in a Cognate receptor can be identified. In addition, substi- 
tutions of charged amino acid residues by residues with increasing side- 
chain length or bulkiness can be used to determine what can be tolerated 
for receptor binding. Finally, replacement of hydrophobic residues by ala- 
nine gives a glimpse of important nonaqueous interactions between ligand- 
receptor pairs. 

Site-directed mutagenesis to determine structure-function relationships 
for ligand-receptor pairs is preferred over ligand truncation mutants be- 
cause it minimizes structural modifications to the native molecule. Another 
advantage of site-directed mutagenesis is that, if structural data exist, the 
surface residues of a protein can be substituted in a predictable pattern, 
one at a time or in groups. The substitution of groups of amino acids 

5 H. Choe, M. Farzan, Y. Sun, N. Sullivan, B. Rollins, P.-D. Ponaih, L. J. Wu, C. R. Mackay, 
C. LaRosa, W. Newman, N. Gerard, C. Gerard, and J. Sodroski, Celt (Cambridge, Mass.) 
85, 1135 (1996). 
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Cayanan, P. J. Maddon, R. A. Koup, J. P. Moore, and W. A. Paxton, Nature (London) 
381, 667 (1996). 
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R. G. CoIIman, and R. W. Doms, Cell {Cambridge, Mass.) 85, 1149 (1996). 
8 R. Horuk, Immunol. Today 15, 169 (1994). 
'B. C. Cunningham and J. A. Wells, Science 244, 1081 (1989). 
10 C. A. H6bert, R. V. Vitangcol, and J. B. Baker, J. Biol Chem. 266, 18989 (1991). 
" J. Hesselgesser, C. Chitnis, L. Miller, D. J. Yansura, L. Simmons, W. Fairbrother, C. Kotts, 

C. Wirth, B. Gillece-Castro, and R. Horuk, J. Biol. Chem. 270, 11472 (1995). 
12 C A. Hubert, A. Chuntharapai, M. Smith, T. Colby, J. Kim, and R. Horuk,/. BioL Chem. 
268, 18549(1993). 

"S. R. Leong, R. C. Kabakoff, and C. A. H6bert, J. Biol. Chem. 269, 19343 (1994). 
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from IL-8 with those from MGSA to produce a hybrid protein capable of 
exhibiting high-affinity binding to CXCR1 (native MGSA binds only with 
low affinity) provides an excellent example of the utility of this technique 
in helping to identify domains of IL-8 that are responsible for receptor 
specificity. 14 A further example of the application of alanine replacement 
is BB-J0010, a variant of MIP-la, that has a single substitution Asp to Ala 
at position 27. 15 This substitution results in a molecule that is fully active 
but does not form the extremely high molecular weight multimers that 
the wild-type chemokine does. BB-10010 has thus found favor in clinical 
situations in which patients undergoing chemotherapy can be dosed with 
high concentrations of the chemokine without fear of aggregate formation 
that could reduce its efficacy. 



Protocols 

Expression Systems 

Recombinant expression in Escherichia coli is one of the most common 
methods for producing chemokines. 16 " 19 A number of vectors have been 
described including the pET-3 and PET-11 series, from Stratagene (La 
Jolla, CA). These are generally used in combination with a BL21(DE3) E. 
coli strain (Strategene) that contains T7 RNA polymerase under the control 
of lacUV5. This expression system allows genes placed downstream of the 
RNA polymerase binding site (in pET vectors) to be expressed on addition 
of isopropylthiogalactoside (IPTG). Upstream cloning of a signal sequence, 
prior to the chemokine gene, such as human growth hormone (hGH) or 
heat-stable enterotoxin II (hstll)' 1 has been shown to aid in the expression 
of a soluble secreted protein. 

Once an appropriate vector construct has been made with the chemo- 
kine gene of interest, site-directed mutagenesis can be performed to gener- 
ate a variety of clones. The site-directed alanine substitution mutants can 

14 H.B. Lowman, P. H. SlagJ, L. E. DeForge, C. M. Wirth, B. I_ Gillece-Caslro, J. H. Bourell 

and W. J. Fairbroiher, J. Biol Chem. 271, 14344 (1996). 
"M. G. Hunter, L. Bawden, D. Brotherton, S..Craig, S. Cribbes, L. G. Czaplewski, T. M. 
^ Dexter. A. H. Drummond, A. H, Gearing, and C. M. Heyworth, Blood 86, 4400 (1995) 

I. Undley, H. Aschauer, J.-M. Seifert, C. Lam. W. Brunowsky, E. Kownatzki, M. Thelen 
... -P,.Peven,B. Dewald, V. vonTschamer,- A.-Walz, and MrBaggiolinirProc. Natl Acad: Scl" 

U.S.A. 85, 9199 (1988). 

17 R. Horuk, D, G. Yansura, D. Reilly, S. Spencer, J. Bourell, W. Henzel, G.Rice, and E. 

Unemori, /. Biol. Chem. 2*8, 541 (1993). 
" M. C N. Johnson el al, J. Biol Chem. 271, 10853 (1996). 

J. Zagorski and J. E. DeLarco, Protein Expression Purif. 5, 337 (1994). 
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be made by constructing oligonucleotide probes that cover the desired 
mutation followed by polymerase chain reaction (PCR) to fill in the rest 
of the chemokihe gene sequence. There are-a variety of kits available from 
several companies that use similar, simple, and easy-to-follow methods 
to insert the desired mutations. These kits include the Transformer Site- 
Directed Mutagenesis Kit (Clontech, Palo Alto, CA) and the Chameleon 
Double-Stranded Site-Directed Mutagenesis Kit (Stratagene). The basic 
principle behind these kits is illustrated in Fig. 1. With these bacterial 
expression systems it is simple to produce small amounts of proteins using 
1-liter shake flasks; alternatively, this process can be scaled up to larger 
10- to 20-liter fermentation runs to produce gram quantities of protein. In 
general, chemokines can be produced as secreted, fully folded proteins, 11,17 
but sometimes problems with folding and also with the generation of 
N-terminal addition mutants can occur (see Horuk et o/. 19a for a discussion). 

Chemokines have also been produced in mammalian expression sys- 
tems. 20 A variety of cell lines, such as human kidney 293 cells, Chinese 
hamster ovary (CHO) cells, and COS cells, can be used. Mammalian expres- 
sion vectors are generally larger than E. coli vectors, with viral promoters 
and enhancers (pSI, pCI, and pCI-neo from Promega, Madison, WI; 
pcDNA3 from Invitrogen, San Diego, CA; or pPUR from Clontech). The 
use of an upstream signal sequence, such as hGH, CD4, or human immuno- 
globulin (Ig), can greatly enhance the production of a soluble secreted 
protein. A major advantage of mammalian expression is that the secreted 
protein is usually correctly folded. In addition with the introduction of 
defined, nonserum-containing medium (HyQ-CCM 1 to 5, Hyclone, Logan, 
UT, and CHO-S-SFM II, GIBCO-BRL, Grand Island, NY) downstream 
purification steps are easier, and there is less risk of contamination with 
lipopolysaccharide (LPS), which is frequently a problem with E. coli expres- 
sion systems. 



Peptide Synthesis 

An alternative to protein expression that has found increasing use for 
the generation of chemokine mutants is peptide synthesis. Although this 
can be an expensive process ($20/amino acid plus purification), it is quick 
compared to expression methods. For a discussion of peptide synthesis of 
chemokines see Clark-Lewis et al. m 

"* R. Horuk, D. Reilly, and D. Yansura, Methods EnzymoL 287 [1], 1997 (this volume). 
20 E. Balentien, J. H. Han, H. G. Thomas, D. Z. Wen, A. K. Samantha, C O. Zachariae, 

P. R. Griffin, R. Brachmann, W. L. Wong, K. Matsushima, and R. Derynck, Biochemistry 

29,10225(1990). 

m I. Clark-Lewis, L, Vo, P. Owen, and J. Anderson, Methods EnzymoL 287 [16], 1997 
(this volume). 
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Ffori.StfMqlyfbrgeneratibndf alanine'scan mulantsr(iyDenattire thedouble-stranded- 
plasmid DNA Anneal the two oligonucleotide primers, the R primer altering a unique 
restriction site and the M primer containing the mutation codon. (2) Perform primer extension 
using dNTTs and 17 or T4 DNA polymerase and ligation with T4 DNA ligase. Digest DNA 
with restriction enzyme for unique site. Renature DNA to result in a mature of linear and 
plasmid DNA. Transform into £ co/i strains [BMH 71-18 mutS (Clontech) or XLmutS 
(Stratagene)] that are unable to perform DNA mismatch repair. (3) Recover plasmid DNA 
(Plasmid Mini Kit, Qiagen, Chalsworth, CA). (4) Perform second digestion with restriction 
enzyme. Transfect into E. coli and recover final plasmid. Sequence the mutated gene to verify 
the correct sequence. 



287 [16], 1997 




Most chemokines are very basic proteins with isoelectric points between 
pH 8.0 and 10.5. In contrast most E. coli and serum proteins from mamma- 
lian cell expression have acidic isoelectric points. Thus chemokines can be 
purified at neutral pH by a combination of S-Sepharose, heparin-Sepharose, 
and hydrophobic interaction (phenyl-Sepharose) chromatography (see 
Horuk et a/. 19a ). This procedure coupled with reversed-phase high-perfor- 
mance liquid chromatography (HPLC) can give chemokines that are >95% 
pure based on silver staining sodium dodecyl sulfate-polyacrylamide gel 
electrophoresis (SDS-PAGE) and amino acid composition." Soluble 
chemokine supernatants from E. coli or mammalian cell supernatants can 
be purified by this method. Alternatively, the construction of expression 
plasmids containing nucleotides encoding hexahistidine, FLAG peptide, 
glutathionine, or immunoglobulin sequences engineered into the cDNAs 
of specific chemokines can greatly aid in their purification. 21 " 23 The protein 
epitope can be engineered into the amino- or carboxyl-terminal region of 
the molecule (the position is largely dependent on the maintenance of full 
biological activity). 

Biological Activity 

Once wild-type and mutant chemokine proteins have been generated 
and purified they are usually analyzed by receptor-binding and biqactivity 
studies to determine structure-function relationships. For receptor-binding 
studies, a wide variety of radiolabeled chemokines are available commer- 
cially (Dupont NEN, Boston MA; Amersham, Arlington Heights, IL) or 
can be iodinated. 233 Binding assays can be carried out on whole cells or ., 
membranes prepared from ceils that express the appropriate chemokine 
receptors. Scatchard analysis of the binding data can then be used to deter- 
mine the relative receptor-binding affinity of the mutant chemokine, which 
is defined as the K D of the wild-type/K D mutant X 100%. Biological assays 
for chemokines i are numerous arrd"aFe~descnbed"elsewhere: 23b However, - 
one biological assay that is used as a standard is chemotaxis. All chemokines 
induce a migratory response in target cells that express functional G-pro- 
tein-coupled chemokine receptors. In addition, a primary response to li- 

21 H. M. Sassenfeld, Trends BiotechnoL 8, 88 (1990). 

22 A. S. Robeva, R. Woodard, D. R. Luthin, H. E. Taylor, and J. Linden, Biochem. Pharmacol 
51, 545 (1996). 

23 A. Kuusinen, M. Arvola, C. <j»cer-Blom, and K. Keinanen, Eur. J. Biochem. 233, 720 (1995). 
231 G. L Bennett and R. Horuk, Methods Enzymol 288 [10] (1997). 

2 * D. Baly, U. Gibson, D. Allison, and L DeForge, Methods Enzymol. 287 [6], 1997 (this 
volume); R. C. Newton and K. Vaddi, Methods Enzymol. 287 [12], 1997 (this volume). 
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gand-receptor binding is the induction of the intracellular release of calcium 
stores. Calcium flux assays are also popular methods of assessing chemo- 
kine function. 23 * ■ 



Applications 

The following section gives an outline for the specific production of 
alanine scan mutants of MGSA that have been generated to probe struc- 
ture-function relationships of the chemokine receptors CXCR2 and duffy 
antigen receptor for chemokines (DARC). 11 This approach is useful in 
generating a mutant of MGSA, E6A, that is able to inhibit malaria parasite 
invasion of human erythrocytes by receptor blockade of the receptor DARC 
but is a very poor agonist of the CXCR2 receptor on neutrophils. 

Plasmid Construction 

Plasmid construction and mutagenesis are achieved by a combination 
of vectors and kits described above (see also Fig. 1) and are illustrated 
with reference to the MGSA mutants. The pMG34 £. coli secretion vector 
has been previously described. 17 This vector contains the hstll signal se- 
quence to aid in secretion of the chemokine and an alkaline phosphatase 
promoter that is induced when the E. coli cells are grown in a low phosphate- 
containing medium. 24 The MGSA gene sequence is fused to the hstll se- 
quence as described. 17 The resulting plasmid contains an EcoRI site 5' to 
the MGSA gene and a BsaJl site 3' to the MGSA sequence. For each 
mutant the entire MGSA gene is synthesized (-219 bases) to include the 
respective codon changes for the amino acid substitution. Double-stranded 
DNA is synthesized by the use of an AmpliTaq DNA Polymerase kit 
(Perkin-Elmer, Roche Molecular Systems, Branchburg, NJ) with appro- 
priate amounts of DNA and polymerase per the manufacturer's specifica- 
tions. DNA is purified by the use of QIAquick Nucleotide Removal Kit 
(QiagenrGhatsworth, GA) per-the-manufacrurer-s specifications. 



The MGSA DNA and pMG34 vector are cut separately with £coRI 
restriction endonuclease (New England Biolabs, Beverly, MA) in 50 mM 
NaCl, 10 mM Tris-HCl, 10 mM MgCl 2 , 1 mM dithiothreitol (DTT) at 37° 
for 2 hr. The temperature is then raised to 65° for 20 min to inactivate 
EcoKl followed by addition of BsaJl endonuclease (New England Biolabs). 
The mixture is then incubated at 60° for 2 hr. This results in £coRl over- 
hangs with the MGSA DNA and cleaved plasmid. The DNA mixture is 

m S. R. McColl and P. H. Naccache, Methods Emymol 288 [18] (1997); K. B. Bacon, Methods 

EnzymoL 288 [22] (1997). 
24 C. N. Chang, M. Rey, B. Bochner, H. Heyneker, and G. Gray, Gene SS, 189 (1987). 
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purified by the QIAquick kit as above. The mutated MGSA DNA is then 
ligated into the vector by the use of T4 DNA ligase (20 units, 0.3 Weiss 
unite; New England Biolabs) in 50 mM Tris-HCl (pH 7.5), 10 mM MgCl 2 , 
10 mM DTT, 0.5 (iM ATP, 25 jig/ml bovine serum albumin (BSA), 200 
ng vector, and 100 ng insert at 16° for 16 hr at a final volume of 20 fi\. 
Ligase activity is stopped by heat inactivation at 65° for 20 min. 

The plasmid is transfected into E. coliby the CaCl 2 /heat shock method 25 
and grown in LB broth with 50 /ig/ml carbenicilliii. The plasmid is recovered 
from the E. coli by the use of Qiagen Plasmid Mini Kit as above. Finally, 
the plasmid is sequenced through the entire gene and restriction sites to 
verify the correct orientation and nucleotide sequence. Following plasmid 
recovery and sequencing, the plasmid is transformed into E. coli, cells are 
grown overnight in low phosphate medium, and the MGSA is purified 
as described." 

Studies with Melanoma Growth Stimulating Activity Mutants 

Because MGSA binds with high affinity to the chemokine receptors 
CXCR2 (IL-8RB) and DARC, 2637 receptor-binding studies with the MGSA 
mutants are carried out in cells expressing these receptors. Cells are incu- 
bated with 125 Mabeled MGSA in the absence and presence of unlabeled 
MGSA or MGSA mutants; typical competition binding curves are shown 
in Fig. 2. The E6A mutant of MGSA is able to bind with high affinity to 
DARC (K D = 7 nM compared to K D = 3.5 nM for MGSA), but in contrast 
binds with low affinity to CXCR2 (K D = 476 nM for E6A compared to 
K D = 2.3 nM for MGSA). In contrast, the R8A mutant exhibits low-affinity 
binding to both receptors (for CXCR2 K D = 850 nM and for DARC 
K D = 157 nM). Thus, the Arg-8 residue of MGSA appears to be crucial 
for expression of high-affinity binding for both receptors. 

In-light of the-binding-data obtained-for-the-E6A-and.R8A_mutants,_ 

functional assays are carried out both with DARC and with CXCR2. DARC 
is known to be a cofactor for the invasion of human erythrocytes by the 
malarial parasite Plasmodium vivax 7 * and the related monkey parasite 
Plasmodium knowlesi. 29 The C-X-C chemokines IL-8 and MGSA have 

15 M. Mandel and A. Higa, J. Mot. Biol. 53, 159 (1970). . 

26 J. Lee, R. Homk, G. C. Rice, G. L. Bennett, T. Caroerato, and W. 1. Wood, / Biol. Chan. 
267,16283(1992). 

27 R. Homk, C. E. Chitnis, W. C Darbonne, T. J. Colby, A. Rybicki, T. J. Hadley, and L. H. 
Miller, Science 261, 1182 (1993). 

a L. H. Miller, S. J. Mason, D. F. Clyde, and M. H. McGinniss, N. Engl J. Med. 295, 302 (1986). 
29 L. H. Miller, S. J. Mason, J. A. Dvorak, M. H. McGinniss, and I. K. Rothman, Science 189, 
561 (1975). 
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Ro. 2. MGSA. competition binding studies. Competition binding was studied between 
,23 I-MGSA and increasing concentrations of unlabeled MGSA mutants E6A and R8A for 
(A) human erythrocytes and (B) human kidney cells transfected with CXCR2. Cells were 
incubated 1 hr at 4° with 125 1-MGSA in the presence of increasing amounts of the unlabeled 
MGSA mutants. (Adapted from Ref. 11 with permission.) 



68 



C-X-C CHEMOKINES 



(51 



[6] 



1 20 








100 




=*=S 










B 0 








6 0 




— *-^MGSA 




4 0 




— •— E6A 




2 0 




— *— RBA 

. : ..... L 





0.1 



1 10- 1 00 

MGSA or MGSA mutant (nM) 



1000 



FlG. 3. Inhibition of erythrocyte invasion by P. knowlesi by MGSA and MGSA mutants. 
The invasion rates are expressed as a percentage of the rate of invasion in the absence of 
chemokines. Inhibition of invasion EC 50 values are as follows: MGSA, 7 nM; E6A, 8.6 nM; 
R8A, >1 nM. (Adapted from Ref. 11 with permission.) 

•been shown to dose-responsively inhibit both parasite binding and inva- 
sion. 27 Thus, the ability of MGSA and the MGSA mutants E6A and R8A to 
block P. knowlesi invasion of human erythrocytes is assessed as previously 
described." As shown in Fig. 3 the E6A MGSA mutant is able to block 



malarial parasi.t 
MGSA (for M( 
line with the ch, 
is unable to blc 
The mutant* 
express CXCK 
release assay (F 
are biologically 
8nAf). Thus, tl 
us to identify a> 
human erythrot 
be useful therar. 
rocyte invasion 



Conclusion 

The exampl. 
illustrates the p 
specific residue: 
but also forger • 
chemokine rece 
diseases, the use 
tor binding don 
would be usefu 




Fig. 4. MGSA and MGSA mutant stimulation of elastase release from human neutrophils. 
The ECso value is defined as the concentration of MGSA or mutant required for half-marimal 
release of elastase from neutrophils. (Adapted from Ref. 11 with permission.) 
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malarial parasite invasion in a dose-responsive manner similar to wild-type 
MGSA (for MGSA EC 50 = 7 nM, for E6A EC*, = 8,6 nM). However, in 
line with the change in binding affinity of R8A to DARC, the R8A mutant 
is unable to block malarial invasion at concentrations up to 1 fiMi 

The mutants are then tested for biological activity on neutrophils, which 
express CXCR2, using elastase release assays. The data from the elastase 
release assay (Fig. 4) clearly show that the MGSA mutants E6A and R8A 
are biologically inactive (EC 50 10 fiM) compared to wild-type MGSA (EC50 
8 nM). Thus, the use of alanine scan mutagenesis of MGSA has allowed 
us to identify an MGSA mutant, E6A, that can block malarial invasion of 
human erythrocytes but will not activate neutrophils, properties that may 
be useful therapeutically in the design of small molecules that inhibit eryth- 
rocyte invasion by P. vivax but have no effect on neutrophils. 
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Conclusion 

The example of alanine scan mutagenesis described for MGSA clearly 
illustrates the powerful nature of this technique not only for identifying 
specific residues involved in the binding of ligands to their native receptors 
but also for generating potentially useful drugs. Because chemokines and 
chemokine receptors play an important role in a variety of acute and chronic 
diseases, the use of alanine scanning mutagenesis to accurately define recep- 
tor binding domains could aid in the design of molecular antagonists that 
would be useful therapeutically. 
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Introduction 

Chemokines play a major role in mobilizing leukocytes to ward off 
attacks by invading pathogens. 1 Each of the two major classes of chemokines 
is selective for a particular group of immune cells. In this chapter, we discuss 
the assays available to measure the biological activities of one of these 

1 T. Schall, in "The Cytokine Handbook" {A. Thompson, ed.). p. 419. Academic Press, San 
Diego, 1994. 
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Abstract 

Airway hyper-reactivity is a characteristic feature of many inflammatory lung diseases and is 
defined as an exaggerated degree of airway narrowing. Chemokines and their receptors are 
involved in several pathological processes that are believed to contribute to airway hyper- 
responsiveness, including recruitment and activation of inflammatory cells, collagen deposition 
and airway wall remodeling. These proteins are therefore thought to represent important 
therapeutic targets in the treatment of airway hyper-responsiveness. This review highlights the 
processes thought to be involved in airway hyper-responsiveness in allergic asthma, and the 
role of chemokines in these processes. Overall, the application of chemokines to the prevention 
or treatment of airway hyper-reactivity has tremendous potential. 
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Introduction — > 

One of the key features of pulmonary diseases such as 
allergic asthma, cystic fibrosis and chronic obstructive pul- 
monary disease is the development of airway hyper- 
responsiveness (AHR) [1-3]. The factors involved in the 
development of AHR seem to differ between diseases, so 
for clarity this review will focus on the development of 
AHR during allergic asthmatic disease. In the context of 
asthma, AHR equates to an exaggerated bronchoconstric- 
tor response, not on|y to allergens to which the subjects 
are sensitized, but also to a range of non-specific stimuli, 
including agents as diverse as cold air and methacholine. 



-Under ^ norrml-conditions.-airway-reacHvity.-the-ability to 
alter the* size of the airways reversibry in response to 
stimuli, is an essential component of homeostasis. For 
example, when there is a need to move large volumes of 
air, such as with exercise, bronchial dilation occurs. Con- 
versely, when it is important to limit or decrease the 
volume of air inspired, such as with exposure to irritating 
gases, the lung defends itself with coughing and bronchial 
narrowing. When this response is excessive, it is referred 
to as airway or bronchial hyper-reactivity or hyper-respon- 
siveness (AHR) and manifests itself as an exaggerated 
bronchoconstrictor response to various provocative 



AHR = airway hyper-responsiveness; BALF = broncho alveolar lavage fluid; CCR = CC chemokine receptor; CXCR = CXC chemokina receptor; 
FEV, = forced expiratory volume over 1 s; U. = interteukin; LTC 4 = leukotriene C 4 ; MCP = macrophage chemoattractant protein; MIP = macrophage 
inflammatory protein; PC, 0 = provocative concentration; PD, 0 = provocative dose; Th cells, T helper cefls. 
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Tablet 


Chamoldne receptors and their Uganda 


CXC chemokine receptors 


CC chemokkie receptors 


CXCR1 • B_-8 GCP-2 


CCR1 : MIP-1 a, RANTES, MCP-3, MlP-8 


CXCR2: IL-8, GCP-2, GROcc,B,y, ENA-78 


CCR2:MCP-1 toMCP-6 


CXCR3:IP-10,MIG,ITAC 1 


CCR3: Eotaxln, MCP-3,4, RAMTES 


CXCR4: SDF-1 


OCR4:TARC,MDC 


CXCR5: BCA 


CCR5: MIP-1 a, RANTES, MIP-1 B 




CCR6:LARC,MIP-3a 




CCR7:SLC,MIP-3B 




CCR8: 1-309. TARC, MIP-1 B 




CCR9: MIP-1 oc,p, MCP-1 , MCP-5 




CCR1 0, SLC, IARC, BLC-1 , ESWne 




CCR1 1 : MCP-1 to MCP-5, eotodn 



GCP-2, granulocyte chemo tactic protein-2; GRO, growth- related oncogene; IP-1 0, Yinterferon -inducible protein 1 0; MIG, monokine-ir<duced by 
-f-interferon; TARC = T cell and activation-related chemokine; SLC = secondary lymphoid tissue chemokine; SDF-1 , stromal cell-derived factor; 
BCA, B-ceD chemoattractant; ENA, epithelial cell-derived neutrophil-activating factor; RANTES, regulated upon activation normal T cell expressed 
and secreted; ITAC, interferon-inducible T cell a chemoattractant 



agents. Measurements of AHR have traditionally been 
used to identify individuals who are at risk of developing 
asthma or related illnesses. The essential feature to these 
tests is to provide stimuli of varying intensity, such as 
methacholine, to the airways of the individual and record 
the decrease in lung function that develops. The resulting 
stimulus-response curve that develops is then analyzed to 
determine the quantity of agent required to produce a 
given degree of obstruction as measured by various spiro- 
metric or plethysmography variables. Such changes are 
usually expressed as a percentage decrease in forced 
expiratory volume over 1 s (FEV,). The three variables that 
are most often examined in quantifying the magnitude of 
the response are the concentration of an agonist that 
induces a fixed decrease in lung function (ie a 20% 
decrease in FEV,), the slope of the dose-response curve, 
and the dose at which a plateau can be produced. Typi- 
cally, the response is expressed as either a provocative 
dose (PD 20 ) or a provocative concentration (PC 20 ). 

How this hyper-responsive state is acquired is poorly 
understood; however, in general, as the disease process 
becomes more severe the airways become more respon- 
sive. At present H is believed that AHR can result from the 
coordination of several mechanisms, some or all of which 
might be operative in individual asthmatics. In asthma a 
relationship seems to exist between the inflammatory state 
of the airways and the severity of hyper-responsiveness. In 
addition, airway remodeling, including smooth muscle 
hyperplasia/hypertrophy, collagen deposition and sub- 
epithelial fibrosis, might contribute to the development of 
AHR [4-6]. Because recent work in the field of chemokine 



biology has highlighted a role for these proteins in many of 
these inflammatory processes, chemokines might be inti- 
mately involved in the initiation and maintenance of AHR. 
In this regard, chemokines could be attractive therapeutic 
targets for the treatment of pulmonary disease with an 
AHR component, in particular asthma. 

Introduction to chemokines 

During the past decade, our understanding of the mecha- 
nisms involved in the initiation and maintenance of pul- 
monary disease has been greatly aided by advances in the 
field of chemokine biology. Chemokines comprise four 
supergene families, classified into groups on the basis of 
the number and arrangement of conserved amino acid 
sequences at the N terminus. Two of these families (the CC 
and CXC chemokine groups) contain over 50 identified 
ligands and at least 1 4individual_receptora-(TableJ )._Two _ 
additional chemokine famifies (C and CX^C chemokines) 
are small and contain, respectively, lymphotactin and 
fractalkine as their members. Recent knowledge of this 
superfamily has grown significantly as a result of the avail- 
ability of large databases of ' expressed sequence tags and 
bioinformatics [71. Furthermore, characterization of these 
chemokines in vivo has identified multiple roles within 
inflammation, including the regulation of leukocyte traffick- 
ing, the hrtmunomodulation of leukocyte activation, fibrosis, 
angiogenesis, hematopoiesis and organogenesis [8]. 

The biological effects of chemokines are mediated by the 
interaction of these soluble proteins with specific receptors, 
which belong to the superfamily of seven-transmembrane 
G-protein-coupled receptors. So far, 11 CC chemokine 
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receptors, five CXC chemokine receptors, one CX3C 
chemokine receptor and one C chemokine receptor have 
been characterized [7,9]. Chemokine receptors exhibit mul- 
tiple ligand specificity, although the chemokine-ligand 
promiscuity does not usualjy cross the boundaries between 
CC and CXC, except for the promiscuous duffy. antigen 
receptor complex that is believed to act as a sink for 
unbound chemokines. Chemokine receptor distribution on 
leukocytes confers selective chemoattractant activities for 
leukocyte subsets, making them ideal candidates for a role 
in leukocyte subset trafficking at sites of inflammation; that 
is, getting the correct subpopulation of cells to migrate into 
the tissue. Whereas CXC chemokines such as interieukin-8 
(IL-8) activate predominantly neutrophils, CC chemokines 
such as RANTES and eotaxin target a variety of cell types 
including macrophages, eosinophils and basophils. 
However, controversial results have been published regard- 
ing this distinct chemokine receptor profile on leukocytes, 
particularly in allergic diseases. It has been shown recently 
that, after the appropriate stimuli, the CC chemokine recep- 
tors CCR1 and CCR3 can be expressed on neutrophils, 
indicating a wider role for CC chemokines than mononu- 
clear cell activation and recruitment (10,11). Furthermore, 
both the CXC chemokine receptors CXCR1 and CXCR2 
have been identified on eosinophils in addition to neu- 
trophils [12J. However, chemokine receptor expression is 
not limited to inflammatory cells. H is interesting to note that 
structural cells such as epithelial cells, endothelial cells, 
smooth muscle cells and fibroblasts also express 
chemokine receptors and are able to produce chemokines; 
they are therefore capable of contributing to a wide range of 
biological functions [1 3-1 5). 

Once chemokines are released, they can have profound 
and longlasting biological effects both in the microenviran- 
ment of their release and at distant sites. These effects, 
including leukocyte recruitment and activation, smooth 
muscle proliferation, regulation of collagen deposition and 
coordination of fibrosis, might have key individual roles in 
the establishment and maintenance of AHR [4,5]. 



Chemokines and leukocyte recruitment in AHR 

Studies in both animals and humans have demonstrated a 
positive correlation between the inflammatory state of the 
airways and the severity of AHR. However, because the 
type and cause of this inflammation, as well as the extent 
and consequences of the inflammatory process, vary 
between different diseases exhibiting AHR (Table 2), the 
direct contribution of. individual cell types or chemokines 
to AHR is not yet clearly understood. As discussed above, 
the distinct pattern of chemokine receptors on leukocytes 
means that chemokines can exert effects on particular 
leukocyte subsets. Therefore, the selective recruitment of 
leukocytes to sites of inflammation in these diseases is 
strongly influenced by the temporal pattern of chemokine 
expression. 



Table 2 



Cellular Infiltrate in the airway wall In asthma and chronic 
obstructive pulmonary disease ■ 



Asthma 


Chronic obstructive pulmonary disease 


T lymphocytes, CD4 


T lymphocytes, CD8 


CD25 . 


CD25, VLA-1 . 


Eosinophil 


M3d eosinophils 


Activated eosinophils 


Non-activated eosinophils 


Mast cells 


Mast cells 


Neutrophils 


Neutrophils 




Macrophages 



Eosinophil recruitment and AHR 

Lung eosinophilia is a fundamental trait of allergic asthma, 
and infiltration of the airways by eosinophils seems to be 
central in the pathogenesis of this disease [16-18]. 
Eosinophils and their products have been identified in 
sputum, bronchoarveolar lavage fluid (BALF) and biopsy 
material of the airways of patients with asthma. Further- 
more, the number of these cells and the amount of their 
products correlate with the severity of airway reactivity 
[16,17,19,20]. 

Eosinophils contribute to the development of AHR 
through the activation, degranulation and release of pro- 
teins and oxidative products stored in their granules. 
These proteins include major basic protein (MBP), 
eosinophil cationic protein (ECP), eosinophil-derived neu- 
rotoxin (EDN) and eosinophil peroxidase (EPO). In addi- 
tion, eosinophils generate oxidative products and lipid 
mediators, including platelet-activating factor and 
leukotriene C 4 .(LTC<). The generation of these cytotoxic 
products can cause extensive tissue damage and enhance 
the accumulation of inflammatory cells. Damage to airway 
epithelium appears to correlate with airway hyper-reactivity 
because the loss of epithelium leads to the exposure of 
'irritant' receptors of hervesT wfiicrTmight 1ncrease _ the" 
response of the airways to various stimuli. 

Several chemokines, including macrophage chemoattrac- 
tant protein (MCP)-3, macrophage inflammatory protein 
(MIPMa, MCP-4, RANTES and eotaxin, elicit the migra- 
tion of eosinophils [21-23] and can confer some degree 
of selectivity on eosinophil recruitment Specifically, 
eotaxin, a potent activator of eosinophils and T helper 2 
(Th2) lymphocytes, interacts with CCR3 expressed on 
eosinophils [24-28] to cause both degranulation and 
chemotaxis of eosinophils [29,301. Elevated levels of 
eotaxin detected in the sputum of asthmatics has been 
shown to be correlated with increased eosinophil numbers 
and eosinophil cationic protein levels [31]. In several 



murine models of asthma, a pronounced lung eosinophilia 
was associated with an increase in eotaxin expression; a 
neutralizing antibody against eotaxin significantly inhibited 
eosinophil infiltration after antigen challenge and 
decreased AHR in these animals [28]. Contrasting effects 
on eosinophil recruitment and AHR have been demon- 
strated in eotaxin gene-deficient mice, possibly owing to 
the presence of the other, recently identified, CCR3-spe- 
cific ligands eotaxin-2 and eotaxin-3 [32,33]. In addition to 
eosinophil chemotaxis and activation, eotaxin, in combina- 
tion with IL-5, has been shown to mobilize eosinophils 
from the bone marrow, thereby increasing circulating 
numbers of eosinophils within the blood [34]. However, 
eotaxin is not the only chemokine able to modulate 
eosinophil accumulation within the lung. Murine models of 
allergic inflammation have shown the movement of 
eosinophils during the early stages of asthma to be depen- 
dent on RANTES and MIP-1 a, whereas eotaxin has been 
shown to be necessary for eosinophil accumulation during 
chronic stages ol the response [35,36]. Therefore, to 
target chemokines for therapeutic intervention effectively it 
is essential to understand the temporal pattern of 
chemokine release. 

Chemokine-induced recruitment of Th2 cells 
and AHR 

In addition to eosinophils, T cells constitute a large propor- 
tion of the inflammatory cells within the lungs of asthmat- 
ics. Indeed, T-cell-mediated immune responses are 
believed to be important contributors to AHR in asthmatic 
patients through the release of chemokines and cytokines 
that enhance lung inflammation, favor the production of 
IgE, activate eosinophils and mast cells, and directly 
enhance AHR [37-39]. The observation that T cells have 
a role in AHR is supported by findings that the transfer of 
T cells from a hyper-responsive mouse strain into a hypo- 
reactive strain induces non-specific airway reactivity [40]. 
Furthermore, a characterization of lymphocyte populations 
in asthmatics and non-asthmatics has demonstrated differ- 
ences in T cell subtypes in biopsy specimens and BALF 
f rom pa tient s with asthma: in asthmatics , significantly 
higher numbers of Th2-type cells were seen than in 
control subjects, whereas there was no difference in the 
number of Th1-type cells [41]. Th2-type cells can be dis- 
tinguished by the profile of cytokines that they produce, 
such as IL-4, IL-1 3 and IL-5, which favor the production of 
IgE and the growth and activation of eosinophils and mast 
cells, in addition to enhancing AHR in vivo [37-39]. 

Although lymphocytes have long been known to accumu- 
late at sites of immune and inflammatory reactions, attrac- 
tants that induce these responses have been identified 
only recently. RANTES, MIP-1 a and MIP-1 p were the first 
chemokines for which rymphocyte-chemotactic activity 
was reported. The monocyte-chemotactic proteins (MCP- 
1, MCP-2, MCP-3 and MCP-4) are also potent attractants 
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of T lymphocytes. Gonzalo ef a/ [28], using neutralizing 
antibodies directed against MCP-1 or MCP-5, significantly 
attenuated the recruitment of both eosinophils and T cells 
to the lung in a murine model of ovalbumin-induced airway 
inflammation, and drastically reduced AHR. In contrast, 
the neutralization of MIP-1 a caused only a slight reduction 
in eosinophilia and AHR, and had no effect on T cell accu- 
mulation [28]. In a separate study by Lukacs et al [42'], 
neutralization of MIP-1 a or RANTES had no effect on AHR 
in a murine model of allergy, although eosinophilia was 
reduced significantly. 

The expression of chemokine receptors on lymphocytes 
and their responsiveness to chemokines vary considerably 
between subsets. CCR5 is expressed preferentially in Th1 
cells, whereas CCR3 and CCR4 seem to be characteris- 
tic of Th2 cells [43,44]. It is therefore not suprising that 
chemokines that preferentially recruit Th2-type cells have 
recently been identified. A number of chemokines have 
been shown to have the ability to recruit Th2-type cells 
preferentially, including monocyte-derived chemokine 
(MDC) and I-309 [45,46]. T cells recruited to the lung by 
these chemokines may regulate the persistence and acti- 
vation of other cells such as eosinophils or mast cells in 
the airways of patients with asthma via both direct contact 
and through the release of other inflammatory mediators 
which contribute to enhanced AHR. 

Mast cells and AHR 

Mast cells that are located in mucosal and peribron- 
chovascular areas of the lung are known to be important in 
allergic reactions within the lung. These cells have the 
capacity to release a variety of mediators that can cause 
acute bronchospasm, activate and/or attract other inflam- 
matory cells in the lung, and possibly increase AHR [47]. 
Indeed, there is a strong correlation between amounts of 
histamine in the airways of allergic asthmatics and sensi- 
tivity of the airways to methdcholine [48,49]. 

MCP-1, a CC chemokine that binds CCR2, has been 
shown to induce AHR by the activation of mast cells in the 
lung.~Activati6F of hW^cells witlTMCP-1 causes the 
release of histamine, leukotrienes, platelet-activating factor 
and various proteases that either directly mediate changes 
in AHR or further enhance the recruitment of leukocytes to 
the lungs [36]. Increased levels of MCP-1, in murine 
models of allergic inflammation, have, been shown to.acti- 
vate mast cells directly [36]. In addition, increased levels 
of MCP-1 have been detected in BALF and bronchial 
tissue from patients with atopic asthma in comparison with 
controls [50,51]. With the use of a murine model of cock- 
roach antigen-induced allergic airway inflammation, it has 
been demonstrated that anti-MCP-1 antibodies inhibit 
AHR to methacholine and attenuate histamine release into 
the BALF; furthermore, in normal mice, instillation of 
MCP-1 induced prolonged airway hyper-reactivity and 
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histamine release. In addition, MCP-1 directly induced pul- 
monary mast cell degranulation in vtiro [36]. In asthmatic 
patients, histamine and LTC 4 either directly induce AHR or 
facilitate the recruitment of leukocytes to the lungs to 
induce AHR indirectly (52,53]. Thus, the induction and 
evolution of allergic airway inflammation which is depen- 
dent on the temporal expression of multiple chemokines 
and their ligands have been shown to play a key role in the 
establishment of AHR. 

The role of airway remodeling and 
subepithelial fibrosis in AHR 

Although several studies show a direct correlation 
between AHR and airway inflammation, the causal rela- 
tionship between leukocyte infiltration and AHR has not 
been finally settled. There is a discordance in the findings 
between investigative groups who have studied the rela- 
tionship between airway inflammation, as assessed by cel- 
lular infiltration, and AHR. Some groups have shown a 
strong relationship between the presence of inflammatory 
cells and enhanced airway responsiveness [16,17,19,20], 
whereas other groups have failed to establish such a rela- 
tionship [18,54-57]. The conflicting evidence might 
reflect the reality that other factors in addition to, or dis- 
tinct from, airway inflammation may modulate AHR. Of par- 
ticular interest is the role that airway remodeling and 
subepithelial fibrosis play in AHR. 

Airway wall thickening, airway smooth muscle 
hypertrophy and subepithelial fibrosis in AHR 

Histologic studies have reported a marked increase in the 
amount of smooth muscle in airways from asthmatic sub- 
jects; this abnormality, along with airway inflammation, is 
thought to contribute to AHR. It is believed that increased 
smooth muscle mass would allow the development of 
greater force and enhanced narrowing of the airway lumen 
to a given contractile stimulus. It has also been shown that 
smooth muscle cells can display different phenotypes 
depending on their environment or culture conditions. 
Smooth muscle cells have been shown to exhibit a classi- 
cal contractile phenotype and also a prol'rferative- 
synthetic ph¥riotypl>7T*hich are capable of producihg~pro : " 
inflammatory cytokines, chemokines and growth factors 
that further affect the environment within the lung [58]. 
Airway smooth muscle cells releasing chemokines such as 
eotaxin, RANTES, MCP-1, MCP-2 and MCP-3 [59-61] 
augment inflammatory responses within the lung such as 
leukocyte recruitment and activation, as discussed previ- 
ously, that further exacerbate AHR. Because the increase 
in bronchial smooth muscle mass in asthma is due to cell 
hypertrophy in addition to hyperplasia [62], the potential 
relevance of phenotype plasticity and its possible relation- 
ship to altered function of smooth muscle in disease 
states has been suggested. Allergic sensitization and 
exposure to certain cytokines elicit significant functional 
changes [39,63,64] that can after both contractile and 



Table 3 



Elevated chemokines in anergic asthmatic lungs shown In vivo 
to participate In AHR 



In vivo evidence for 
Chemokine elevated involvement in 



in allsrgb asthmatics 


AHR so far 


Eotaxin 


Yes 


Eotaxin-2 


No 


RANTES 


Yes 


MCP-3 


No 


MCP-1 


Yes . 


MIP-1a 


Yes 


CCR3 


Yes 



secretory functions; however, it remains to be seen how 
chemokines can alter this phenotype. 

Although fibrosis is an essential component of tissue 
healing and wound repair, clinical studies have demon- 
strated that the degree of subepithelial fibrosis is corre- 
lated with augmented AHR to methacholine [4]. Indeed, a 
buildup of interstitial collagen beneath the airway base- 
ment membrane and subepithelial fibrosis are present in 
the airways of allergic asthmatics [6]. Infiltrating inflamma- 
tory cells such as macrophages, lymphocytes, neutrophils 
and eosinophils participate in the pathogenesis of lung 
fibrosis, through the activation of fibroblasts via the 
release of inflammatory mediators or direct contact 
[65,66]. Recent evidence has shown that MCP-1 
enhances collagen deposition by fibroblasts [67]; there- 
fore, increased expression of this chemokine in the lungs 
of asthmatics might be responsible for the airway remodel- 
ling that can exacerbate AHR. 

Chemokine expression in allergic asthma and 
their therapeutic use 

Sb~far7most"of the results indicatinglTrble for chemokines '. 
in AHR have been obtained through murine models 
employing chemokine neutralization, transgenic methods 
or gene knockout methods. The question therefore arises 
as to why chemokines would be beneficial targets for the 
therapeutic treatment of AHR in humans. Clinical studies 
have shown elevated levels of the chemokines and 
chemokine receptors that have been identified in murine 
models and in BALF, bronchial biopsies and sputum from 
allergic asthmatics (Table 3). Eotaxin, CCR3, mRNA and 
protein have beeri found to be significantly elevated in 
bronchial mucosal biopsies from atopic asthmatics in 
comparison with normal controls [68]. Furthermore, an 
inverse correlation was made in this study between the 
expression of eotaxin mRNA and the histamine provoca- 



tive concentration causing a 20% decrease in FEV t [68]. 
Significant correlations with clinical parameters of AHR 
were also found with MCP-1 levels in BALF from patients 
with allergic asthma [69]. A comprehensive study by Ying 
et al [70"] measured elevated levels of mRNA for eotaxin, 
eotaxin-2, RANTES, MCP-3, MCP-4 and CCR3 in the 
bronchial mucosa from allergic asthmatics [67]. In addi- 
tion, levels of RANTES, MlP-jla and MCP-1 in BALF have 
been shown to be significantly increased 4 h after chal- 
lenge with endobronchial allergen in allergic asthmatics 
compared with levels before the allergen challenge [71]. 
Therefore the chemokines that have been shown to have a 
role in AHR in murine models are elevated in human 
disease and might be potential targets for the develop- 
ment of therapeutic interventions. 

Conclusions 

Taken together, both experimental evidence from murine 
models and clinical evidence of elevated chemokine and 
chemokine receptor levels in the allergic asthmatic lung 
suggest that chemokines, and their receptors, seem to be 
effective targets for the development of therapeutic inter- 
ventions to be used in addition to current therapy for the 
treatment of AHR. However, it remains to be seen whether 
the first clinical trials bear out this promise. 
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The challenges of genome sequence annotation 
or "The devil is in the details" 



Temple F. Smith and Xiaolin Zhang 

Two powerful, competing pressures are act- 
ing on various genome sequencing projects: 
One, to release new sequences as quickly as 
possible; and two, to provide them with 
maximally complete and accurate annota- 
tion. This .rather incongruent combination 
has led to a strong interest in developing effi- 
cient and accurate automated, large-scale 
sequence annotation procedures. 

There have, in fact, been a number of 
attempts in both industry and academia to 
speed new sequence annotation. In their 
simplest form, these have been little more 
than post-processors acting on standard 
high-speed sequence similarity search tools 
such as BLAST. The post-processing assigns 
the annotation from the best-matched previ- 
ously known sequence to each new sequence. 

This.is, of course; a generalization of suc- 
cessful approaches used by many researchers 
to assign probable functions to new 
sequences when previously studied and rec- 
ognizable homologs exist. However, when 
applied in an automated manner to large 
data sets with minimum review, such 
approaches can lead to serious degradation 
of the wealth of incoming genomic data. 

There are more problems with the simple 
best match functional annotation inheritance 
(BMAI) than the two traditionally recognized, 
those being the assessing of biological signifi- 
cance in terms of match statistical significance, 
and the choice between the sensitivity of the 
very fast, bat approximate, sequence similarity 
search algorithms and the mathematically rig- 
orous, but much slower, optimal algorithms. 

In the first place, it is easy to assign various 
measures of confidence to new annotation 
baKd^cm lhatch statisticsrand-there'is good- 
evidence that approximate maximum similar- 
ity tools such as BLAST do nearly as well as 
any of the slower, full dynamic programming 
methods. Second, the newer versions of 
BLAST have high sensitivity, identifying local 
sequence pairwise similarities, including 
alignment gaps. The inclusion of alignment 
gaps was one of the main advantages of the 
slower dynamic programming methods. 
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No, the major problems associated with 
nearly all of the current automated annota- 
tion approaches are — paradoxically — minor 
database annotation inconsistencies (and a 
few outright errors). This is particularly true 
for the large and often complex protein fami- 
lies. Why are these the major problems, 
rather than the two more obvious ones pre- 
viously mentioned? 

Clearly, for researchers studying a particu- 
lar protein family, most database annotation 
inconsistencies make little difference in the 
search for new, even distant members. A local 
expert eitfier knows the range and/or history 
of the annotation terminology used by col- 
leagues in different subfields, or perhaps 
more importantly, the expert will spend the 
time to backtrack apparent inconsistencies. 

Even in those cases involving structurally 
complex proteins composed of multiple 
domains, all of which may not be fully or 
properly annotated, the expert generally 
carefully dissects matches to distinct 
domains, and backtracks each domain's 
annotations. However, in the large-scale 
genomic projects, having a local expert to 
work on each protein family is not an option. 
Yet the integration of genomic information 
across multiple protein families, multiple 
fields of expertise and 
taXa, is. just what is envi- 
sioned to form the foun- 
dations of the next 
century's biology and 
biotechnology. 

The basic problem of 
inconsistent nomenclature 
arises largely because 
-sequence-information-and_ _ 
its annotation derives from 
many diverse subdivisions 
of the biological sciences 
during a time of rapid 
change in our understand- 
ing. In an emerging field 
such as molecular biology, 
let alone "comparative 
genomics," strictly con- 
trolled vocabularies would 
not only be difficult to 
impose, but are probably 
undesirable! The evolution 
and refinement of the 
vocabulary is an anticipat- 
ed outcome of our 
increasing knowledge. 



Some inconsistencies are simple, such ai 
the reference to tRNA synthetase? in fungi u 
tRNA ligases t which of course they are) or the 
use by Americans and most Europeans of 
dihydroxyacelonc-P for a glycohv. interme- 
diate that the Japanese and Engii ■• generally 
call glycerone-P. There are" ma;-. ; ; cascs_u( 
equivalent, but different, terminology. l : o t 
example, in the well-studied G protein case, 
among 27 distinct G (3-subunit GenBanU 
SW1SS-PROT entries, there are 18 different 
protein names or keyword sets. A list of syn- 
onyms can be constructed in such caso, 
some of which will be species or field specific. 

There are numerous_cases in which pro- 
teins~of very different current ivmrtioro arc 
homologous in that they evoiv-J from a 
common ancestor and will match with sig- 
nificant sequence similarity. For example, 
numerous proteins sharing multiple WD- 
repeats have been labeled transducin-likc or 
transducin homologs, yet share no common 
signal transduction function'. The rather 
widespread improper use of synthetase foi 
synthase and the converse, however, cannot 
be fixed by a thesaurus, since nether the 
enzyme in question requires Kl •' or not u 
not a matter of alternate terminology. * ith- 
out the careful use of synonym tables in 
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Figure 1. An example of genes having the p °^ 0 «^i 
annotation Inheritance transitivity. The three J 1 " 6 irKTa 
proteins. b1262, YKL211C, and M 263, share . 
domain In common. Domains are labeled by #40*7% 
phosphoribosyi anthranllate isomerase; greeny syntt^Sjf 
glycerol phosphate synthase; yellow, arrthran - er3 s* j|S 
6ubunK II; blue, anthranllate phosphoribosyttran -j™ 
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. in , t ion with review of commonly mis- 
terminology, any simple BMA1 
**Lch will often end up propagating the 
^'desirable or erroneous annotations. 
•Cndom propagation of faulty annota- 
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ffr 7sim'lar of the domains, and at worst 
jn3(: h the annotation of a nonshared 
"!!Lin from the matched protein. 

The first of these, incomplete annotation, 
. ^rn in the recentlv released Eschericia coli 
"?ome data for ORF b!262, a 453-rdsidue, 
Stfunctional protein^. Here, the - first 253 
lino a-id residues comprise the indole-3- 
j*ero! phosphate synthase domain. This . 
itches single-domain homologs in 
Mhvwcoccus jannaschii and Bacillus .subtilis 
* the carboxy-terminal domain of the pro- 
«„ product of one yeast gene, YKL21 1C. The 
.xond domain of the E. coli protein residues 
259 through 443 matches the N-phosphonbo- 
-( anthranilate isomerase, single-domain 
wotein in M. jannaschii, B. subtilis, and yeast 
(awl this function is currently unannotated). 

An i --.correct inheritance via a matched 
BuUidoriain protein is seen in the M. jan- 
MjchiiORF pair, MJ0234 and MI0238. Both 
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match the £. coli ORF b 1263. a Afunctional 
enzyme of two separate domains. Both M. 
jannaschii genes have been annotated, how- 
ever, by only one of the two functions: 
anthranilate synthase subunit U, which is 

What must be done to 
avoid continued annotation 
inconsistency, incomplete- 
ness, and erroneous 
propagation? 
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associated only with the first 176 of bl263"s 
531 amino acids, and ti.at region is matched 
onlybyMJ0238 (Fig. 1). 

What must be done to avoid continued 
annotation inconsistency, incompleteness, and 
erroneous propagation? First, any automation 
must be rather sophisticated. It must, for a 
start, recognize large differences in the length 
of matching sequences; it must associate anno- 
tation with specific subsequences; it must rec- 
ognize all differences among the annotations 
of the homologs to the matched sequence; 
and, whenever possible, sequence similarity 
should be identified via shared conserved 
sequence patterns or profiles that have been 



caretunyannotated, consistent with the entire 
family characterized by that pattern. All 
approaches should exploit the best available 
synonym tables, such as those available 
through resources like PROSITE, the Enzyme 
Commission, or the US National Library of 
Medkine's UM15 database. Finally, any anno- 
tation strategy must be designed to support an 
evolving nomenclature and rapidly expanding 
knowledge base. 

Even if it takes jn extended period of . 
time to annotate the new genome data more 
carefully and completely now, it will" surely 
be more cost effective than redoing it later. 
Recall that the correcting and/or updating of 
all of the historical data in largely archival - 
sequence databases such as GenBank or 
SWISS-PROT, has not yet been completed— 
probably for good reasons of cost and time. 
We in the basic research and biotechnology 
communities must not let our excitement or 
our impatience for the new data degrade its 
annotation and longer-term utility. 



1 Neer EJ.. Schmidt. CX Nambudripad. Ft., and 
Smith. T.F. 1994. The ancient regulatory-protein family 
of WD-repeat proteins. Nature 371:297-300. 

2. ORF data obtained from: 

M. jannaschii: www.tigr.omAdb/mdb/mdb.mml 
E. coli: www.genetics.wisc.edu 
S cerevisiae: speedy.mips.biochem.mpg.de 
8 subtilis: www.pasteur.fr/subWsubti_flaLhtml 
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Errors in genome annotation 



f i ilic time i!i-«t Wat so-, -.nd Crick proposed a structure 
Rfor DNA, a visionary might have suggested that the 
complete genetic sequence of an organism would eventu- 
ally be known. However, nobody could have realistically 
proposed that machines could automatically indicate gene 
functions. Yet precisely this has been achieved: with no 
laboratory experiments at all, the roles of most genes in 
several organisms have been reported. 

But how reliable arc these functional assignments, upon 
which we depend for understanding genes and genomes? 
Without laboratory experiments to verify the compu- 
tational methods and their expert analysis, it is impossible 
to know for certain. However, a simple procedure can 
place a rough upper bound on their accuracy. 1 have com- 
pared three different groups" functional annotation'"' for 
the Mycoplasma genitalium genome" (Fig. 1). Where two 
groups' descriptions arc completely incompatible, at least 
one must be in error. In my analysis, there is no penalty 



for 
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Three dots represent (left to right) Frasier et at\ Koonin et it? and Ouzouws e( 
«/.' annotations for each of the «68 M. tenUalivm genes. (Tentative cases 
from Ouzounis ff tl> were not used.) An open Mack circle indicates lack of a 
substantial functional annotation. Compatible annotations are colored 
identically, while conflicting annotations are in different colors. It is unknown 
which. H any, of the annotations are actually correct There are 300 cases 
where Ouiounis et at' simply reported the SWISS-PROT annotation of the same 

At tcaiUtium gene. Indicated by colored open circles. Because Frasier et it. 
■ annotation played a role k> SYOSS-fROT descriptions, these Ouwunis ft it. 
, annotations were tot-included ki this analysis. Though not Incorporated In 
jable 1. the color Indicates the compatibility of the functional annotation. The 
i.eonflW/compatibn«y itvalyslJ here Is Itself certain to have errors; however, 

the's«sf^ldi»otane<*the ma;gnKudeofthe cneJ-uitd annotate" error rate. 



vague or absent functional assignment. Kirthcniu' 
always assume tlij-. as man; groups as pov-.': i... . ( 
right description I Fig. 2 1. 

The results are disappointing for those expecting rcl- 
annotation (Table 1 1. M. gcmtjlium was reported to 
just 468 genes, many of which are fundamental lor all lif ( 
therefore easy to analyse. Nonetheless, the error rate 
least 8% for the 340 genes annotated by r\vo or three- y.i. 
This value may not he uniform across the three groups 
does it reflect the overall significance ot a group's :-.- 
Genes annotated by only one group were no: c ■•••:•.- 
but include such improbable bacterial I unctions 
enhancing factor, mitochondrial polymerase, anc >c:i 
receptor. This analysis cannot detect those cases \ 
multiple groups arrived at consistent but wrong a 
sions - a likely occurrence because all relied or. v 
methods and data. This evaluation also ignores :~i'.i< 
agreements in annotation, and disparities in ecu: 
specificity (possibly indicating problematic over?::-..: 
of function 4 )- Therefore, the true error riio 
greater than these figures indicate. 

There are several possible reasons why the -urn 
analyses have mistakes, as described at greater lenct 
where 5 ' 5 . For example, it may be that the sim 
berwecn the genomic query and database seque 
insufficient to reliably detect homology, an issue sc 
by appropriate use of modern and accurate sequenc 
parison procedures'- 10 . A more difficult problem is ac • 
inference of function from homology. Typical o; 
searching methods are valuable for finding evolu;;- 
related proteins, but if there are only about 1000 
superfamilies jn nature": 11 , then most homolog 
have different molecular and cellular functions. 

The annotation problem escalates dramatically 
.he single-geHome,.for-gcnes with incotrect functv 
"" entered, info putrtTC "databases 4 . Subsequent s 
against these databases then cause errors to prop: 
future functional assignments. The procedure net 
only a few times without corrections before the r< 
that inacle computiHonaOu^ 
- the annotation databases - are so polluted a 
almost useless. To prevent errors from spreadin 
control, database curation by the scientific cot 
1 will be essential 411 . . 
* To ensure that databases are kept usable , the u 
! gene annotation should be dear: does it indicate I 
onholog, and/or functional equivalence? Forrunat 
databases already incorporate this information 
(e.g. Ref. 14). Errors will, of course, still creep ir 
eliminate the collateral damage, computation: 
ments should clearly be flagged a* such, and th 
also indicate their source (which would allow pr 
of corrections) and a measure of confidence in t 
racy. This will require new research and devel 
algorithms and databases, and a broad comn 
maintaining these resources. In short, the acccs: 
mentation needed for reproducibility of a corr 
function determination should be commensurat 
for a corresponding laboratory bench expenme 
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Consistent annotations. Annotal.ons were generally considered cons.slent lor this analysis il either Ihe (unction or the gene name match (e.g. *8<«^ 010 >- 
„ Option is when one group uses a gene name and another specifically notes that the current gene is a parang and no. identical (consider m t « he 
Motions from different groups were compatible, but of different levels ol specificity, this was considered a correct assignment (e.g. mg225). The diflicuU, of 
!?„ tiling paus of descriptions to determine whether they relied compatible functions makes this analysis imprecise. Generally. Ihe approach here ,s generous 
Tshould en on the side ol detecting too lew errors; il is usually more permissive than Ref 5. mg463: Frasier et al.' and Koonin et a/.' describe deferent aspects 
ur-.ion but give the same gene name. The Ouzounis et a/.' description is compatible with that (torn Koonin e( a/.', but less specific. All three annotations are 
Inhered correct for this analysis. mgOlO: Frasier el a/.' and Ouzounis el a/.' agree that this is a OKA primase. Koonin el al.' use a different («» ">•» ""> 
.!-i,-.tW state. that this is > truncated protein. Because of Ihe common functional descriptions, all three are considered coned. However, if Koonin f. a/, had been 
illicit in indicating a functional difference, then their annotation would have been marked as conflicting. (Note that m E 250 ,s also annotated as a ONA primase 
^ all three groups ) mg22S : the Ouzounis el a/.' annotation of histidine permease is more specif.c than the Koonin el a/.' description of amino ac.d permease. It may 
. that histidine permease is an (incorrect) overprediction of function, or it could be correct. The two annotations are consideied consistent, and Ihe deaston of 
Lier el il ' not to provide a function is not penalized, (b) Inconsistent annotations. mg302: lack of a functional assignment from Frasie. et al: is not penal.zed. 
he Lii. et al- and Ouzounis et al' annotations are wholly inconsistent.. This leads to a conflict and a minimum error rale ol 50%. Note that the awssment 
methodology also behaves correctly when two annotators provide diflerent functions for a mulli-functional enzyme: each of the annotators .s half nght and hall 
1„, ,n I the assessmenl assigns a 50% error rate. mg448: Frasier el al.' and .Ouzounis et al > both describe the gene as pilB. The encoded protem is involved in 
oi.in 'ormation and its biochemical function is catalysis of methionine sulfoxide oxidation/reduction in proteins. The Koonin et al.' annotation, chaperone-like 
orUn could conceivably be compatible but this is not likely. Because ol uncertainty regarding compatibility of the Koonin et al.' annotation and its qua , .cation 
Is putative this set ol annotations is right on the threshold of consideration. For this analysis, the Koonin et al.' annotation was considered to be in • connicttmth 
the others 'giving a minimum error rate ol 33%. mg085: all three groups provide contradictory (unctions: The lunction described by Frasier ft al} of HMG-CoA 
e uclase is EC 1 1 1 34. while the NADH-ubiquinone oxidoreductase annotated by Ouzounis et al? (nu6m_marpo) is EC 1.6.5.3. Neither enzyme uses ATP or GTP. 
as specified by Koonin et al.' The analysis assumes one is correct and marks two incorrect. Note. Ouzounis et al.' annotations equivalent to SW1SS-PROT included ,n 
these examples are not included in the Table 1 analysis. 
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Contributed by David Botstein and Arnold J. Levine, October 21, 1998 

ABSTRACT Wnt family members are critical to many 
developmental processes, and components of the Wnt signal- 
ing pathway have been linked to tumorigenesis in familial and 
sporadic colon carcinomas. Here we report the identification 
of two genes, WISP-1 and WISP-2, that are up-regulated in the 
mouse mammary epithelial cell line C57MG transformed by 
Wnt-1, but not by Wnt-4. Together with a third related gene, 
WISPS, these proteins define a subfamily of the connective 
tissue growth factor family. Two distinct systems demon- 
strated WISP induction to be associated with the expression of 
Wnt-1. These included (0 C57MG cells infected with a Wnt-1 
retroviral vector or expressing Wnt-1 under the control of a 
tetracyline repressible promoter, and (if) Wnt-1 transgenic 
mice. The WISP-1 gene was localized to human chromosome 
8q24.1-8q24.3. WISP-1 genomic DNA was amplified in colon 
cancer cell lines and in human colon tumors and its RNA 
overexpressed (2- to > 30-fold) in 84% of the tumors examined 
compared with patient-matched normal mucosa. WISP-3 
mapped to chromosome 6q22-6q23 and also was overex- 
pressed (4- to >40-fold) in 63% of the colon tumors analyzed. 
In contrast, WISP-2 mapped to human chromosome 20ql2- 
20ql3 and its DNA was amplified, but RNA expression was 
reduced (2- to >30-fold) in 79% of the tumors. These results 
suggest that the WISP genes may be downstream of Wnt-1 
signaling and that aberrant levels of WISP expression in colon 
cancer may play a role in colon tumorigenesis. 



Wnt-1 is a member of an expanding family of cysteine-rich, 
glycosylated signaling proteins that mediate diverse develop- 
mental processes such as the control of cell proliferation, 
adhesion, cell polarity, and the establishment of cell fates (1, 
2). Wnt-1 originally was identified as an oncogene activated by 
the insertion of mouse mammary tumor virus in virus-induced 
mammary adenocarcinomas (3, 4). Although Wnt-1 is not 
expressed in the normal mammary gland, expression of Wnt-1 
in transgenic mice causes mammary tumors (5). 

In mammalian cells, Wnt family members initiate signaling 
by binding to the seven-transmembrane spanning Frizzled 
receptors and recruiting the cytoplasmic protein Dishevelled 
(Dsh) to the cell membrane (1, 2, 6). Dsh then inhibits the 
kinase activity of the normally constitutively active glycogen 
synthase kinase-3/3 (GSK-3/3) resulting in an increase in 
j3-catenin levels. Stabilized j3-catenin interacts with the tran- 
scription factor TCF/Lefl, forming a complex that appears in 
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the nucleus and binds TCF/Lefl target DNA elements to 
activate transcription (7, 8). Other experiments suggest that 
the adenomatous polyposis coli (APC) tumor suppressor gene 
also plays an important role in Wnt signaling by regulating 
/3-catenin levels (9). APC is phosphorylated by GSK-3/3, binds 
to /3-catenin, and facilitates its degradation. Mutations in 
either APC or /3-catenin have been associated with colon 
carcinomas and melanomas, suggesting these mutations con- 
tribute to the development of these types of cancer, implicating 
the Wnt pathway in tumorigenesis (1). 

Although much has been learned about the Wnt signaling 
pathway over the past several years, only a few of the tran- 
scriptionally activated downstream components activated by 
Wnt have been characterized. Those that have been described 
cannot account for all of the diverse functions attributed to 
Wnt signaling. Among the candidate Wnt target genes are 
those encoding the nodal-related 3 gene, Xnr3, a member of 
the transforming growth factor (TGF)-/3 superfamily, and the 
homeobox genes, engrailed, goosecoid, twin (Xtwn), and siamois 
(2). A recent report also identifies c-myc as a target gene of the 
Wnt signaling pathway (10). 

To identify additional downstream genes in the Wnt signal- 
ing pathway that are relevant to the transformed cell pheno- 
type, we used a PCR-based cDNA subtraction strategy, sup- 
pression subtractive hybridization (SSH) (11), using RNA 
isolated from C57MG mouse mammary epithelial cells and 
C57MG cells stably transformed by a Wnt-1 retrovirus. Over- 
expression of Wnt-1 in this cell line is sufficient to induce a 
partially transformed phenotype, characterized by elongated 
and refractile cells that lose contact inhibition and form a 
multilayered array (12, 13). We reasoned that genes differen- 
tially expressed between these two cell lines might contribute 
to the transformed phenotype. 

In this paper, we describe the cloning and characterization 
of two genes up-regulated in Wnt-1 transformed cells, WISP-1 
and WISP-2, and a third related gene, WISP-3. The WISP genes 
are members of the CCN family of growth factors, which 
includes connective tissue growth factor (CTGF), Cyr61, and 
nov, a family not previously linked to Wnt signaling. 

MATERIALS AND METHODS 

SSH. SSH was performed by using the PCR-Select cDNA 
Subtraction Kit (CLONTECH). Tester double-stranded 

Abbreviations: TGF, transforming growth factor; CTGF, connective 
tissue growth factor; SSH, suppression subtractive hybridization; 
VWC, von Willebrand factor type C module. 
Data deposition: The sequences reported in this paper have been 
deposited in the Genbank database (accession nos. AF100777, 
AF100778, AF1 00779, AF100780, and AF100781). 
tTo whom reprint requests should be addressed, e-mail: diane@gene. 
com. 
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cDNA was synthesized from 2 tig of poly(A) + RNA isolated 
from the C57MG/Wnt-1 cell line and driver cDNA from 2 /xg 
of poly(A) + RNA from the parent C57MG cells. The sub- 
tracted cDNA library was subcloned into a pGEM-T vector for 
further analysis. 

cDNA Library Screening. Clones encoding full-length 
mouse WISP-1 were isolated by screening a AgtlO mouse 
embryo cDNA library (CLONTECH) with a 70-bp probe from 
the original partial clone 568 sequence corresponding to amino 
acids 128-169. Clones encoding full-length human WISP-1 
were isolated by screening AgtlO lung and fetal kidney cDNA 
libraries with the same probe at low stringency. Clones en- 
coding full-length mouse and human WISP-2 were isolated by 
screening a C57MG/Wnt-1 or human fetal lung cDNA library 
with a probe corresponding to nucleotides 1463-1512. Full- 
length cDNAs encoding W1SP-3 were cloned from human 
bone marrow and fetal kidney libraries. 

Expression of Human WISP RNA. PCR amplification of 
first-strand cDNA was performed with human Multiple Tissue 
cDNA panels (CLONTECH) and 300 /xM of each dNTP at 
94°C for 1 sec, 62°C for 30 sec, 72°C for 1 min, for 22-32 cycles. 
WISP and glyceraldehyde-3-phosphate dehydrogenase primer 
sequences are available on request. 

In Situ Hybridization. 33 P-labeled sense and antisense ribo- 
probes were transcribed from an 897-bp PCR product corre- 
sponding to nucleotides 601-1440 of mouse WISP-1 or a 
294-bp PCR product corresponding to nucleotides 82-375 of 
mouse WISP-2. All tissues were processed as described (40). 

Radiation Hybrid Mapping. Genomic DNA from each 
hybrid in the Stanford G3 and Genebridge4 Radiation Hybrid 
Panels (Research Genetics, Huntsville, AL) and human and 
hamster control DNAs were PCR-amplified, and the results 
were submitted to the Stanford or Massachusetts Institute of 
Technology web servers. 

Cell Lines, Tumors, and Mucosa Specimens. Tissue speci- 
mens were obtained from the Department of Pathology (Uni- 
versity of Pittsburgh) for patients undergoing colon resection 
and from the University of Leeds, United Kingdom. Genomic 
DNA was isolated (Qiagen) from the pooled blood of 10 
normal human donors, surgical specimens, and the following 
ATCC human cell lines: SW480, COLO 320DM, HT-29, 
WiDr, and SW403 (colon adenocarcinomas), SW620 (lymph 
node metastasis, colon adenocarcinoma), HCT 116 (colon 
carcinoma), SK-CO-1 (colon adenocarcinoma, ascites), and 
HM7 (a variant of ATCC colon adenocarcinoma cell line LS 
174T). DNA concentration was determined by using Hoechst 
dye 33258 intercalation f luorimetry. Total RNA was prepared 
by homogenization in 7 M GuSCN followed by centrifugation 
over CsCl cushions or prepared by using RNAzol. 

Gene Amplification and RNA Expression Analysis. Relative 
gene amplification and RNA expression of WISPs and c-myc in 
the cell lines, colorectal tumors, and normal mucosa were 
determined by quantitative PCR. Gene-specific primers and 
fluorogenic probes (sequences available on request) were 
designed and used to amplify and quantitate the genes. The 
relative gene copy number was derived by using the formula 
2( Act ) where ACt represents the difference in amplification 
cycles required to detect the WISP genes in peripheral blood 
lymphocyte DNA compared with colon tumor DNA or colon 
tumor RNA compared with normal mucosal RNA. The 
d-method was used for calculation of the SE of the gene copy 
number or RNA expression level. The W-YSP-specific signal was 
normalized to that of the glyceraldehyde-3-phosphate dehy- 
drogenase housekeeping gene. All TaqMan assay reagents 
were obtained from Perkin-Elmer Applied Biosystems. 

RESULTS 

Isolation of WISP-1 and WISP-2 by SSH. To identify Wnt- 
1-inducible genes, we used the technique of SSH using the 
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mouse mammary epithelial cell line C57MG and C57MG cells 
that stably express Wnt-1 (11). Candidate differentially ex- 
pressed cDNAs (1,384 total) were sequenced. Thirty-nine 
percent of the sequences matched known genes or homo- 
logues, 32% matched expressed sequence tags, and 29% had 
no match. To confirm that the transcript was differentially 
expressed, semiquantitative reverse transcription-PCR and 
Northern analysis were performed by using mRNA from the 
C57MG and C57MG/Wnt-1 cells. 

Two of the cDNAs, WISP-1 and WISP-2, were differentially 
expressed, being induced in the C57MG/Wnt-1 cell line, but 
not in the parent C57MG cells or C57MG cells overexpressing 
Wnt-4 (Fig. 1 A and B). Wnt-4, unlike Wnt-1, does not induce 
the morphological transformation of C57MG cells and has no 
effect on /3-catenin levels (13, 14). Expression of WISP-1 was 
up-regulated approximately 3-fold in the C57MG/Wnt-1 cell 
line and WISP-2 by approximately 5-fold by both Northern 
analysis and reverse transcription-PCR. 

An independent, but similar, system was used to examine 
WISP expression after Wnt-1 induction. C57MG cells express- . 
ing the Wnt-1 gene under the control of a tetracycline- 
repressible promoter produce low amounts of Wnt-1 in the 
repressed state but show a strong induction of Wnt-1 mRNA 
and protein within 24 hr after tetracycline removal (8). The 
levels of Wnt-1 and WISP RNA isolated from these cells at 
various times after tetracycline removal were assessed by 
quantitative PCR. Strong induction of Wnt-1 mRNA was seen 
as early as 10 hr after tetracycline removal. Induction of WISP 
mRNA (2- to 6-fold) was seen at 48 and 72 hr (data not shown). 
These data support our previous observations that show that 
WISP induction is correlated with Wnt-1 expression. Because 
the induction is slow, occurring after approximately 48 hr, the 
induction of WISPs may be an indirect response to Wnt-1 
signaling. 

cDNA clones of human WISP-1 were isolated and the 
sequence compared with mouse WISP-1. The cDNA sequences 
of mouse and human WISP-1 were 1,766 and 2,830 bp in length, 
respectively, and encode proteins of 367 aa, with predicted 
relative molecular masses of =40,000 (A/ r 40 K). Both have 
hydrophobic N-terminal signal sequences, 38 conserved cys- 
teine residues, and four potential N-linked glycosylation sites 
and are 84% identical (Fig. 24). 

Full-length cDNA clones of mouse and human WISP-2 were 
1,734 and 1,293 bp in length, respectively, and encode proteins 
of 251 and 250 aa, respectively, with predicted relative molec- 
ular masses of =27,000 (Af r 27 K) (Fig. IB). Mouse and human 
WISP-2 are 73% identical. Human WISP-2 has no potential 
N-linked glycosylation sites, and mouse WISP-2 has one at 

C57MG 



Parent Wnt-1 Wnt-4 




Fig. 1 . WISP-1 and WISP-2 are induced by Wnt-1 , but not Wnt-4, 
expression in C57MG cells. Northern analysis of WISP-1 (A) and 
WISP-2 (S) expression in C57MG, C57MG/Wnt-1, and C57MG/ 
Wnt-4 cells. Poly(A) + RNA (2 fig) was subjected to Northern blot 
analysis and hybridized with a 70-bp mouse WISP-1 -specific probe 
(amino acids 278-300) or a 190-bp W75P-2-specific probe (nucleotides 
1438-1 627) in the 3' untranslated region. Blots were rehybridized with 
human /3-actin probe. 



Cell Biology, Medical Sciences: Pennica et al. 



Proc. Natl. Acad. Sci. USA 95 (1998) 14719 



A. 

ihuh vnsw 
humMi.WBP-1 



mouM.WtSP-1 
hwnwn.WBP-1 



mouM.WBP-1 3*1 
Nuawn.WISP-1 a»i 



mouM.WlSP-1 a« 



mouM.WBP-1 >■» 
hunun.WBP-1 in, 



up ;! ■■: --lob; ~ -: c ■>:■>' v s ■. : - ~ .: i: -ti • :^>;.v:-;:aa < lT: , .v. 

itipTTs « :< " r a : q v t: a y v v o v a <: v :. ;: o v :■: Y|vpi|;:Lj ; u :- n >±'> x ■;• <; : a :• 
■ ■- ■ — VWC Ponmit — 

» ■ ■ — TS PDmu m. 

■ Ln:.cx:lL I'c :iv:j :! ;!':• : J: « a u * x :l a v y e ,4 a a :! a v J sP J' m vl 



humn.mP-1 Hi: 

B. 



■DOUM.W8P-Z i Bi 

humn.Wt8P-a ioi 



, MxnJ- I 1G1--BP Dm-i. 

l^Z^^ vwc: ix««i. 



Fig. 2. Encoded amino acid sequence alignment of mouse and 
human WISP-1 (A) and mouse and human WISP-2 (B). The potential 
signal sequence, insulin-like growth factor-binding protein (IGF-BP), 
VWC, thrombospondin (TSP), and C-terminal (CT) domains are 
underlined. 

position 197. WISP-2 has 28 cysteine residues that are con- 
served among the 38 cysteines found in WISP-1. 

Identification of WISPS. To search for related proteins, we 
screened expressed sequence tag (EST) databases with the 
WISP-1 protein sequence and identified several ESTs as 
potentially related sequences. We identified a homologous 
protein that we have called WISP-3. A full-length human 
WISPS cDNA of 1,371 bp was isolated corresponding to those 
ESTs that encode a 354-aa protein with a predicted molecular 
mass of 39,293. WISP-3 has two potential N-linked glycosyl- 
ation sites and 36 cysteine residues. An alignment of the three 
human WISP proteins shows that WISP-1 and WISP-3 are the 
most similar (42% identity), whereas WISP-2 has 37% identity 
with WISP-1 and 32% identity with WISP-3 (Fig. 3A). 

WISPs Are Homologous to the CTGF Family of Proteins. 
Human WISP-1, WISP-2, and WISPS are novel sequences; 
however, mouse WISP-1 is the same as the recently identified 
Elml gene. Elml is expressed in low, but not high, metastatic 
mouse melanoma cells, and suppresses the in vivo growth and 
metastatic potential of K-1735 mouse melanoma cells (15). 
Human and mouse WISP-2 are homologous to the recently 
described rat gene, rCop-1 (16). Significant homology (36- 
44%) was seen to the CCN family of growth factors. This family 
includes three members, CTGF, Cyr61, and the protoonco- 
gene nov. CTGF is a chemotactic and mitogenic factor for 
fibroblasts that is implicated in wound healing and fibrotic 
disorders and is induced by TGF-/3 (17). Cyr61 is an extracel- 
lular matrix signaling molecule that promotes cell adhesion, 
proliferation, migration, angiogenesis, and tumor growth (18, 
19). nov (nephroblastoma overexpressed) is an immediate 
early gene associated with quiescence and found altered in 
Wilms tumors (20). The proteins of the CCN family share 
functional, but not sequence, similarity to Wnt-1. All are 
secreted, cysteine-rich heparin binding glycoproteins that as- 
sociate with the cell surface and extracellular matrix. 

WISP proteins exhibit the modular architecture of the CCN 
family, characterized by four conserved cysteine-rich domains 
(Fig. 35) (21). The N-terminal domain, which includes the first 
12 cysteine residues, contains a consensus sequence (GCGC- 
CXXC) conserved in most insulin-like growth factor (IGF)- 
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Fig. 3. (A) Encoded amino acid sequence alignment of human 
WISPs. The cysteine residues of WISP-1 and WISP-2 that are not 
present in WISP-3 are indicated with a dot. (B) Schematic represen- 
tation of the WISP proteins showing the domain structure and cysteine 
residues (vertical lines). The four cysteine residues in the VWC domain 
that are absent in WISP-3 are indicated with a dot. (C) Expression of 
WISP mRNA in human tissues. PCR was performed on human 
multiple-tissue cDNA panels (CLONTECH) from the indicated adult 
and fetal tissues. 

binding proteins (BP). This sequence is conserved in WISP-2 
and WISP-3, whereas WISP-1 has a glutamine in the third 
position instead of a glycine. CTGF recently has been shown 
to specifically bind IGF (22) and a truncated nov protein 
lacking the IGF-BP domain is oncogenic (23). The von Wil- 
lebrand factor type C module (VWC), also found in certain 
collagens and mucins, covers the next 10 cysteine residues, and 
is thought to participate in protein complex formation and 
oligomerization (24). The VWC domain of WISP-3 differs 
from all CCN family members described previously, in that it 
contains only six of the 10 cysteine residues (Fig. 3 A and B). 
A short variable region follows the VWC domain. The third 
module, the thrombospondin (TSP) domain is involved in 
binding to sulfated glycoconjugates and contains six cysteine 
residues and a conserved WSxCSxxCG motif first identified in 
thrombospondin (25). The C-terminal (CT) module contain- 
ing the remaining 10 cysteines is thought to be involved in 
dimerization and receptor binding (26). The CT domain is 
present in all CCN family members described to date but is 
absent in WISP-2 (Fig. 3 A and B). The existence of a putative 
signal sequence and the absence of a transmembrane domain 
suggest that WISPs are secreted proteins, an observation 
supported by an analysis of their expression and secretion from 
mammalian cell and baculovirus cultures (data not shown). 

Expression of WISP mRNA in Human Tissues. Tissue- 
specific expression of human WISPs was characterized by PCR 
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analysis on adult and fetal multiple tissue cDNA panels. 
WISP-J expression was seen in the adult heart, kidney, lung, 
pancreas, placenta, ovary, small intestine, and spleen (Fig. 3C). 
Little or no expression was detected in the brain, liver, skeletal 
muscle, colon, peripheral blood leukocytes, prostate, testis, or 
thymus. WISP-2 had a more restricted tissue expression and 
was detected in adult skeletal muscle, colon, ovary, and fetal 
lung. Predominant expression of WISP-3 was seen in adult 
kidney and testis and fetal kidney. Lower levels of WISP-3 
expression were detected in placenta, ovary, prostate, and 
small intestine. 

In Situ Localization of WISP-1 and WISP-2. Expression of 
WISP-1 and WISP-2 was assessed by in situ hybridization in 
mammary tumors from Wnt-1 transgenic mice. Strong expres- 
sion of WISP-1 was observed in stromal fibroblasts lying within 
the fibrovascular tumor stroma (Fig. 4 A-D). However, low- 
level WISP-1 expression also was observed focally within tumor 
cells (data not shown). No expression was observed in normal 
breast. Like WISP-1, WISP-2 expression also was seen in the 
tumor stroma in breast tumors from Wnt-1 transgenic animals 
(Fig. 4 E-H). However, WISP-2 expression in the stroma was 
in spindle-shaped cells adjacent to capillary vessels, whereas 




Fig. 4. (A, C, E, and G) Representative hematoxylin/eosin-stained 
images from breast tumors in Wnt-1 transgenic mice. The correspond- 
ing dark-field images showing WISP-1 expression are shown in B and 
D. The tumor is a moderately well-differentiated adenocarcinoma 
showing evidence of adenoid cystic change. At low power (A and B), 
expression of WISP-1 is seen in the delicate branching fibrovascular 
tumor stroma (arrowhead). At higher magnification, expression is seen 
in the stromal(s) fibroblasts (C and £)), and tumor cells are negative. 
Focal expression of WISP-1, however, was observed in tumor cells in 
some areas. Images of WISP-2 expression are shown in E-H. At low 
power (E and F), expression of WISP-2 is seen in cells lying within the 
fibrovascular tumor stroma. At higher magnification, these cells 
appeared to be adjacent to capillary vessels whereas tumor cells are 
negative (G and H). 



the predominant cell type expressing WISP-1 was the stromal 
fibroblasts. 

Chromosome Localization of the WISP Genes. The chro- 
mosomal location of the human WISP genes was determined 
by radiation hybrid mapping panels. WISP-1 is approximately 
3.48 cR from the meiotic marker AFM259xc5 [logarithm of 
odds (lod) score 16.31] on chromosome 8q24.1 to 8q24.3, in the 
same region as the human locus of the novli family member 
(27) and roughly 4 Mbs distal to c-myc (28). Preliminary fine 
mapping indicates that WISP-1 is located near D8S1712 STS. 
WISP-2 is linked to the marker SHGC-33922 (lod = 1,000) on 
chromosome 20ql2-20ql3.1. Human WISPS mapped to chro- 
mosome 6q22-6q23 and is linked to the marker AFM211ze5 
(lod = 1,000). WISP-3 is approximately 18 Mbs proximal to 
CTGF and 23 Mbs proximal to the human cellular oncogene 
MYB (27, 29). 

Amplification and Aberrant Expression of WISPs in Human 
Colon Tumors. Amplification of protooncogenes is seen in 
many human tumors and has etiological and prognostic sig- 
nificance. For example, in a variety of tumor types, c-myc 
amplification has been associated with malignant progression 
and poor prognosis (30). Because WISP-1 resides in the same 
general chromosomal location (8q24) as c-myc, we asked 
whether it was a target of gene amplification, and, if so, 
whether this amplification was independent of the c-myc locus. 
Genomic DNA from human colon cancer cell lines was 
assessed by quantitative PCR and Southern blot analysis. (Fig. 
5 A and B). Both methods detected similar degrees of WISP-1 
amplification. Most cell lines showed significant (2- to 4-fold) 
amplification, with the HT-29 and WiDr cell lines demonstrat- 
ing an 8-fold increase. Significantly, the pattern of amplifica- 
tion observed did not correlate with that observed for c-myc, 
indicating that the c-myc gene is not part of the amplicon that 
involves the WISP-1 locus. 

We next examined whether the WISP genes were amplified 
in a panel of 25 primary human colon adenocarcinomas. The 
relative WISP gene copy number in each colon tumor DNA 
was compared with pooled normal DNA from 10 donors by 
quantitative PCR (Fig. 6). The copy number of WISP-1 and 
WISP-2 was significantly greater than one, approximately 
2-fold for WISP-1 in about 60% of the tumors and 2- to 4-fold 
for WISP-2 in 92% of the tumors (P < 0.001 for each). The 
copy number for WISP-3 was indistinguishable from one (P - 
0.166). In addition, the copy number of WISP-2 was signifi- 
cantly higher than that of WISP-1 (P < 0.001). 

The levels of WISP transcripts in RNA isolated from 19 
adenocarcinomas and their matched normal mucosa were 
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Fig. 5. Amplification of WISP-1 genomic DNA in colon cancer cell 
lines. (A) Amplification in cell line DNA was determined by quanti- 
tative PCR. (£) Southern blots containing genomic DNA (10 joig) 
digested with £coRI {WfSP-1) or Xbal (c-myc) were hybridized with 
a 100-bp human WISP-1 probe (amino acids 186-219) or a human 
c-myc probe (located at bp 1901-2000). The WISP and myc genes are 
detected in normal human genomic DNA after a longer film exposure. 
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Fig. 6. Genomic amplification of WISP genes in human colon 
tumors. The relative gene copy number of the WISP genes in 25 
adenocarcinomas was assayed by quantitative PCR, by comparing 
DNA from primary human tumors with pooled DNA from 10 healthy 
donors. The data are means ± SEM from one experiment done in 
triplicate. The experiment was repeated at least three times. 

assessed by quantitative PCR (Fig. 7). The level of WISP-1 
RNA present in tumor tissue varied but was significantly 
increased (2- to >25-fold) in 84% (16/19) of the human colon 
tumors examined compared with normal adjacent mucosa. 
Four of 19 tumors showed greater than 10-fold overexpression. 
In contrast, in 79% (15/19) of the tumors examined, WISP-2 
RNA expression was significantly lower in the tumor than the 
mucosa. Similar to WISP-1, WISPS RNA was overexpressed in 
63% (12/19) of the colon tumors compared with the normal 
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Fig. 7. WISP RNA expression in primary human colon tumors 
relative to expression in normal mucosa from the same patient. 
Expression of WISP mRNA in 19 adenocarcinomas was assayed by 
quantitative PCR. The Dukes stage of the tumor is listed under the 
sample number. The data are means ± SEM from one experiment 
done in triplicate. The experiment was repeated at least twice. 



mucosa. The amount of overexpression of WISP-3 ranged from 
4- to >40-fold. 

DISCUSSION 

One approach to understanding the molecular basis of cancer 
is to identify differences in gene expression between cancer 
cells and normal cells. Strategies based on assumptions that 
steady-state mRNA levels will differ between normal and 
malignant cells have been used to clone differentially ex- 
pressed genes (31). We have used a PCR-based selection 
strategy, SSH, to identify genes selectively expressed in 
C57MG mouse mammary epithelial cells transformed by 
Wnt-1. 

Three of the genes isolated, WISP-1, WISP-2, and WISPS, 
are members of the CCN family of growth factors, which 
includes CTGF, Cyr61, and nov, a family not previously linked 
to Wnt signaling. 

Two independent experimental systems demonstrated that 
WISP induction was associated with the expression of Wnt-1. 
The first was C57MG cells infected with a Wnt-1 retroviral 
vector or C57MG cells expressing Wnt-1 under the control of 
a tetracyline-repressible promoter, and the second was in 
Wnt-1 transgenic mice, where breast tissue expresses Wnt-1, 
whereas normal breast tissue does not. No WISP RNA expres- 
sion was detected in mammary tumors induced by polyoma 
virus middle T antigen (data not shown). These data suggest 
a link between Wnt-1 and WISPs in that in these two situations, 
WISP induction was correlated with Wnt-1 expression. 

It is not clear whether the WISPs are directly or indirectly 
induced by the downstream components of the Wnt-1 signaling 
pathway (i.e., /3-catenin-TCF-l/Lefl). The increased levels of 
WISP RNA were measured in Wnt-l-transformed cells, hours 
or days after Wnt-1 transformation. Thus, WISP expression 
could result from Wnt-1 signaling directly through fl-catenin 
transcription factor regulation or alternatively through Wnt-1 
signaling turning on a transcription factor, which in turn 
regulates WISPs. 

The WISPs define an additional subfamily of the CCN family 
of growth factors. One striking difference observed in the 
protein sequence of WISP-2 is the absence of a CT domain, 
which is present in CTGF, Cyr61, nov, WISP-1, and WISP-3. 
This domain is thought to be involved in receptor binding and 
dimerization. Growth factors, such as TGF-/3, platelet-derived 
growth factor, and nerve growth factor, which contain a cystine 
knot motif exist as dimers (32). It is tempting to speculate that 
WISP-1 and WISP-3 may exist as dimers, whereas WISP-2 
exists as a monomer. If the CT domain is also important for 
receptor binding, WISP-2 may bind its receptor through a 
different region of the molecule than the other CCN family 
members. No specific receptors have been identified for CTGF 
or nov. A recent report has shown that integrin avfo serves as 
an adhesion receptor for Cyr61 (33). 

The strong expression of WISP-1 and WISP-2 in cells lying 
within the fibrovascular tumor stroma in breast tumors from 
Wnt-1 transgenic animals is consistent with previous obser- 
vations that transcripts for the related CTGF gene are pri- 
marily expressed in the fibrous stroma of mammary tumors 
(34). Epithelial cells are thought to control the proliferation of 
connective tissue stroma in mammary tumors by a cascade of 
growth factor signals similar to that controlling connective 
tissue formation during wound repair. It has been proposed 
that mammary tumor cells or inflammatory cells at the tumor 
interstitial interface secrete TGF-/31, which is the stimulus for 
stromal proliferation (34). TGF-01 is secreted by a large 
percentage of malignant breast tumors and may be one of the 
growth factors that stimulates the production of CTGF and 
WISPs in the stroma. 

It was of interest that WISP-1 and WISP-2 expression was 
observed in the stromal cells that surrounded the tumor cells 
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(epithelial cells) in the Wnt-1 transgenic mouse sections of 
breast tissue. This finding suggests that paracrine signaling 
could occur in which the stromal cells could supply WISP-1 and 
WISP-2 to regulate tumor cell growth on the WISP extracel- 
lular matrix. Stromal cell-derived factors in the extracellular 
matrix have been postulated to play a role in tumor cell 
migration and proliferation (35). The localization of WISP-1 
and WISP-2 in the stromal cells of breast tumors supports this 
paracrine model. 

An analysis of WISP-1 gene amplification and expression in 
human colon tumors showed a correlation between DNA 
amplification and overexpression, whereas overexpression of 
WISP-3 RNA was seen in the absence of DNA amplification. 
In contrast, WISP-2 DNA was amplified in the colon tumors, 
but its mRNA expression was significantly reduced in the 
majority of tumors compared with the expression in normal 
colonic mucosa from the same patient. The gene for human 
WISP-2 was localized to chromosome 20ql2-20ql3, at a region 
frequently amplified and associated with poor prognosis in 
node negative breast cancer and many colon cancers, suggest- 
ing the existence of one or more oncogenes at this locus 
(36-38). Because the center of the 20ql3 amplicon has not yet 
been identified, it is possible that the apparent amplification 
observed for WISP-2 may be caused by another gene in this 
amplicon. 

A recent manuscript on rCop-1, the rat orthologue of 
WISP-2, describes the loss of expression of this gene after cell 
transformation, suggesting it may be a negative regulator of 
growth in cell lines (16). Although the mechanism by which 
WISP-2 RNA expression is down-regulated during malignant 
transformation is unknown, the reduced expression of WISP-2 
in colon tumors and cell lines suggests that it may function as 
a tumor suppressor. These results show that the WISP genes 
are aberrantly expressed in colon cancer and suggest that their 
altered expression may confer selective growth advantage to 
the tumor. 

Members of the Wnt signaling pathway have been impli- 
cated in the pathogenesis of colon cancer, breast cancer, and 
melanoma, including the tumor suppressor gene adenomatous 
polyposis coli and /3-catenin (39). Mutations in specific regions 
of either gene can cause the stabilization and accumulation of 
cytoplasmic /3-catenin, which presumably contributes to hu- 
man carcinogenesis through the activation of target genes such 
as the WISPs. Although the mechanism by which Wnt-1 
transforms cells and induces tumorigenesis is unknown, the 
identification of WISPs as genes that may be regulated down- 
stream of Wnt-1 in C57MG cells suggests they could be 
important mediators of Wnt-1 transformation. The amplifica- 
tion and altered expression patterns of the WISPs in human 
colon tumors may indicate an important role for these genes 
in tumor development. 
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Froteome analysis: Biological assay or data archive? 

In this review we examine the current state of proteome analysis. There are 
three main issues discussed: why it is necessary to study proteomes; how pro- 
teomes can be analyzed with current technology; and how proteome analysis 
can be used to enhance biological research. We conclude that proteome anal- 
ysis is an essential tool in the understanding of regulated biological systems. 
Current technology, while still mostly limited to the more abundant proteins, 
enables the use of proteome analysis both to establish databases of proteins 
present, and to perform biological assays involving measurement of multiple 
variables. We believe that the utility of proteome analysis in future biological 
research will continue to be enhanced by further improvements in analytical 
technology. 
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1 Introduction 

A proteome has been defined as the protein complement 
expressed by the genome of an organism, or, in multicel- 
lular organisms, as the protein complement expressed by a 
tissue or differentiated ceil [1]. In the most common im- 
plementation of proteome analysis the proteins extracted 
from the cell or tissue analyzed are separated by high 
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resolution two-dimensional gel electrophoresis (2-DE), 
detected in the gel and identified by their amino acid 
sequence. The ease, sensitivity and speed with which gel- 
separated proteins can be identified by the use of recently 
developed mass spectrometric techniques have dramati- 
cally increased the interest in proteome technology. One 
of the most attractive features of such analyses is that com- 
plex biological systems can potentially be studied in their 
entirety, rather than as a multitude of individual compo- 
nents. This makes it far easier to uncover the many com- 
plex, and often obscure, relationships between, mature 
gene products in cells. Large-scale proteome characteriza- 
tion projects have been undertaken for a number of dif- 
ferent organisms and cell types. Microbial proteome pro- 
jects currently in progress include, for example: Sdccharo- 
myces cereyisiae [2], Salmonella enterica 13], Spiroplasma 
melliferum [4], Mycobacterium tuberculosis [5], Ochrobac- 
trum anthropi [6], Haemophilus influenzae [7], Synecho- 
cystis spp. [8], Escherichia coli [9], Rhizobium legumino- 
sarum [10], and Dictyostelium discoideum [11]. Proteome 
projects underway for tissues of more complex organ- 
isms include those for: human bladder squamous cell 
carcinomas [12], human liver [13], human plasma [13], 
human keratinocytes [12], human fibroblasts [12], mouse 
kidney [12], and rat serum [14]. In this manuscript we cri- 
tically assess the concept of proteome analysis and the 
technical feasibility of establishing complete proteome 
maps, and discuss ways in which proteome analysis and 
biological research intersect. 

2 Rationale for proteome analysis 

The dramatic growth in both the number of genome 
projects and the speed with which genome sequences 
are being determined has generated huge amounts of 
sequence information, for some species even complete 
genomic sequences ([15—17]). The description of the 
state of a biological system by the quantitative measure- 
ment of system components has long been a primary 
objective in molecular biology. With recent technical 
advances including the development of differential dis- 
play-PCR [18], cDNA microarray and DNA chip techno- 
logy [19, 20] and serial analysis of gene expression 
(SAGE) [21, 22], it is now feasible to establish global and 
quantitative mRNA expression maps of cells and tissues, 
in which the sequence of all the genes is known, at a 
speed and sensitivity which is not matched by current 
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protein analysis technology. Given the long-standing 
paradigm in biology that DNA synthesizes RNA which 
synthesizes protein, and the ability to rapidly establish 
global, quantitative mRNA expression maps, the ques- 
tions which arise are why technically complex proteome 
projects should be undertaken and what specific types of 
information could be expected from proteome projects 
which cannot be obtained from genomic and transcript 
profiling projects. We see three mam reasons for pro- 
teome analysis to become an essential component in the 
comprehensive analysis of biological systems, (i) Protein 
expression levels are not predictable from the mRNA 
expression levels, (ii) proteins are dynamically modified 
and processed in ways which are not necessarily 
apparent from the gene sequence, and (iii) proteomes 
are dynamic and reflect the state of a biological system. 

2.1 Correlation between mRNA and protein expression 
levels 

Interpretations of quantitative mRNA expression profiles 
frequently implicitly or explicitly assume that for specific 
genes the transcript levels are indicative of the levels of 
protein expression. As part of an ongoing study in our 
laboratory, we have determined the correlation of expres- 
sion at the mRNA and protein levels for a population of 
selected genes in the yeast Saccharomyces cerevisiae 
growing at mid-log phase (S. P. Gygi et al., submitted for 
publication). mRNA expression levels were calculated 
from published SAGE frequency tables (22]. Protein 
expression levels were quantified by metabolic radiola- 
beling of the yeast proteins, liquid scintillation counting 
of the protein spots separated by high resolution 2-DE 
and mass spectrometric identification of the protein(s) 
migrating to each spot. The selected 80 samples consti- 
tute a relatively homogeneous group with respect to pre- 
dicted half-life and expression level of the protein pro- 
ducts. Thus far, we have found a general trend but no 
strong correlation between protein and transcript levels 
(Fig. 1). For some genes studied equivalent mRNA trans- 
cript levels translated into protein abundances which 
varied by more than 50-fold. Similarly, equivalent steady- 
state protein expression levels were maintained by trans- 
cript levels varying by as much as 40-fold (S. P. Gygi 
et al., submitted). These results suggests that even for a 
population of genes predicted to be relatively homoge- 
neous with respect to protein half-life and gene expres- 
sion, the protein levels cannot be accurately predicted 
from the level of the corresponding mRNA transcript. 

2.2 Proteins are dynamically modified and processed 

In the mature, biologically active form many proteins are 
post-translationaily modified by glycosylation, phosphor? 
ylation, prenylation, acylation, ubiquitinatidn or one or 
more of many other modifications [23] and many pro- 
teins are only functional if specifically associated or com- 
plexed with other molecules, including DNA, RNA, pro- 
teins and organic and inorganic cofactors. Frequently, 
modifications are dynamic and reversible and may alter 
the precise three-dimensional structure and the state of 
activity of a protein. Collectively, the state of modifica- 
tion of the proteins which constitute a biological system 
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are important indicators for the state of the system. The 
type of protein modification and the sites modified at a 
specific cellular state can usually not be determined 
from the gene sequence alone. 

23 Proteomes are dynamic and reflect the state of a 
biological system 

A single genome can give rise to many qualitatively and 
quantitatively different proteomes. Specific stages of the 
cell cycle and states of differentiation, responses to 
growth and nutrient conditions, temperature and stress, 
and pathological conditions represent cellular states 
which are characterized by significantly 'different prc^ 
teomes. The proteome, in principle, also reflects events 
that are under translation^ and post-translational con- 
trol. It is therefore expected that proteomics will be able 
to provide the most precise and detailed molecular des- 
cription of the state of a cell or tissue, provided that the 
external conditions defining the state are carefully deter- 
mined. In answer to the question of whether the study 
of proteomes is necessary for the analysis of biomolec- 
ular systems, it is evident that the analysis of mature pro- 
tein products in cells is essential as there are numerous 
levels of control of protein synthesis, degradation* 
processing and modification, which are only apparent by 
direct protein analysis. 



3 Description and assessment of current proteome 
analysis technology 

3.1 Technical requirements of proteome technology 

In biological systems the level of expression as well as 
the states of modification, processing and macro-molec- 
ular association of proteins are controlled and modu- 
lated depending on the state of the system. Comprehen- 
sive analysis of the identity, quantity and state of modifi- 
cation of proteins therefore requires the detection and 



1864 P. A. Haynes el el. 



Elcamphorali 1998, 19, 1852-1871 



quantitation of the proteins which constitute the system, 
and analysis of differentially processed forms. There are 
a number of inherent difficulties in protein analysis 
which complicate these tasks. First, proteins cannot be 
amplified. It is possible to produce large amounts of a 
particular protein by over-expression in specific cell sys- 
tems. However, since many proteins are dynamically 
post-translationally modified, they cannot be easily am- 
plified in the form in which they finally function in the 
biological system. It is frequently difficult to purify from 
the native source sufficient amounts of a protein for 
analysis. From a technological point of view this trans- 
lates into the need for high sensitivity analytical tech- 
niques. Second, many proteins are modified arid pro- 
cessed post-translationally. Therefore, in addition to the 
protein identity, the structural basis for differentially 
modified isoforms also needs to be determined. The dis- 
tribution of a constant amount of protein over several 
differentially modified isoforms further reduces the 
amount of each species, available for analysis. The com- 
plexity and dynamics of post-translational protein edit- 
ing thus significantly complicates proteome studies. 
Third, proteins vary dramatically with respect to their 
solubility in commonly used solvents. There are few, if 
any, solvent conditions in which ail proteins are soluble 
and which are also compatible with protein analysis. This 
makes the development of protein purification methods 
particularly difficult since both protein purification and 
solubility have to be achieved under the same condi- 
tions. Detergents, in particular sodium dodecyl sulfate 
(SDS), are frequently added to aqueous solvents to 
maintain protein solubility. The compatibility with SDS 
is a big advantage of SDS polyacrylamide gel electro- 
phoresis (SDS-PAGE) over other protein separation 
techniques. Thus, SDS-PAGE and two-dimensional gel 
electrophoresis, which also uses SDS and other deter- 
gents, are the most general and preferred methods for 
the purification of small amounts of proteins, provided 
that activity does not necessarily need to be maintained. 
Lastly, the number of proteins in a given cell system is 
typically in the thousands. Any attempt to identify and 
categorize all of these must use methods which are as 
rapid as possible to allow completion of the project 
within a reasonable time frame. Therefore, a successful, 
general proteomics technology requires high sensitivity, 
high throughput, the ability to differentiate differentially 
modified proteins, and the ability to quantitatively dis- 
play and analyze all the proteins present in a sample. 

32 2-D electrophoresis — mass spectrometry: a common 
implementation of proteome analysis 

The most common currently used implementation of 
proteome analysis technology is based on the separation 
of proteins by two-dimensional (IEF/SDS-PAGE) gel 
electrophoresis and their subsequent identification and 
analysis by mass spectrometry (MS) or tandem mass 
spectrometry (MS/MS). In 2-DE, proteins are first separ- 
ated by isoelectric focusing (IEF) and then by SDS- 
PAGE, in the second, perpendicular dimension. Separ- 
ated proteins are visualized at high sensitivity by staining 
or autoradiography, producing two-dimensional arrays of 
proteins. 2-DE gels are, at present, the most commonly 
used means of global display of proteins in complex 



samples. The separation of thousands of proteins has 
been achieved in a single gel [24, 25] and differentially 
modified proteins are frequently separated. Due to the 
compatibility of 2-DE with high concentrations of deter- 
gents, protein denaturants and other additives promoting 
protein solubility, the technique is widely used. 

The second step of this type of proteome analysis is the 
identification and analysis of separated proteins. Individ- 
ual proteins from polyacrylamide gels have traditionally 
been identified using AT-terminal sequencing [26, 271, 
internal peptide sequencing [28, 29], immunoblotting or 
comigration with known proteins [30]. The recent dra- 
matic growth of large-scale genomic and expressed 
sequence tag (EST) sequence databases has resulted in a 
fundamental change in the way proteins are identified by 
their amino acid sequence. Rather than by the traditional 
methods described above, protein sequences are now fre- 
quently determined by correlating mass spectral or 
tandem mass spectral data of peptides derived from pro- 
teins, with the information contained in sequence data- 
bases [31-33]. 

There are a number of alternative approaches to pro- 
teome analysis currently under development. There is 
considerable interest in developing a proteome analysis 
stragegy which bypasses 2-DE altogether, because it is 
considered a relatively slow and tedious process, and 
because of perceived difficulties in extracting proteins, 
from the gel matrix for analysis. However, 2-DE as a 
starting point for proteome analysis has many advan- 
tages compared to other techniques available today! The 
most significant strengths of the 2-DE-MS approach 
include the relatively uniform behavior of proteins in 
gels, the ability to quantify spots and the high resolution 
and simultaneous display of hundreds to thousands of 
proteins within a reasonable time frame. 

A schematic diagram of a typical procedure of the identi- 
fication of gel-separated proteins is shown in Fig. 2. Pro- 
tein spots detected in the gel are enzymatically or chemi- 
cally fragmented and the peptide fragments are isolated 
for analysis, as already indicated, most frequently by MS 
or MS/MS. There are numerous protocols for the gener- 
ation of peptide fragments from gel-separated proteins. 
They can be grouped into two categories, digestion in 
the gel slice [28, 34] or digestion after electrotransfer out 
of the gel onto a suitable membrane ([29, 35-37] and 
reviewed in [38]). In most instances either technique is 
applicable and yields good results. The analysis of MS or 
MS/MS data is an important step in the whole process 
because MS instruments can generate an enormous 
amount of information which cannot easily be managed 
manually. Recently, a number of groups have developed 
software systems dedicated to the use of peptide MS 
and MS /MS spectra for the identification of proteins. 
Proteins are identified, by correlating the information 
contained in the MS spectra of protein digests or 
MS/MS spectra of individual peptides with data con- 
tained in DNA or protein sequence databases. 

The systems we are currently using in our laboratory are 
based on the separation of the peptides contained in pro- 
tein digests by narrow bore or capillary liquid chromatog- 
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Figure 2. Schematic diagram of a procedure for identification of gel- 
separated proteins. Peptides can either be separated by a technique 
such as LC or CE, or infused as a mixture and sorted in the MS. Data- 
base searching can either be performed on peptide masses from an 
MS spectrum, peptide fragment masses from CID spectra of peptides, 
or a combination of both. 

raphy [39, 40] or capillary electrophoresis [41], the anal- 
ysis of the separated peptides by electrospray ioniza- 
tion (ESI) MS/MS, and the correlation of the generated 
peptide spectra with sequence databases using the 
SEQUEST program developed at the University of Wash- 
ington [32, 33]. The system automatically performs the 
following operations: a particular peptide ion character- 
ized by its mass-to-charge ratio is selected in the MS out 
of all the peptide ions present in the system at a parti- 
cular time; the selected peptide ion is collided in a colli- 
sion cell with argon (collision-induced dissociation, 
CID) and the masses of the resulting fragment ions are 
determined in the second sector of the tandem MS; this 
experimentally determined CID spectrum is then corre- 
lated with the CID spectra predicted from all the pep- 
tides in a sequence database which have essentially the 
same mass as the peptide selected for CID; this correla- 
tion matches the isolated peptide with a sequence seg- 
ment in a database and thus identifies the protein from 
which the peptide was derived. There are a number of 
alternative programs which use peptide CID spectra for 
protein identification, but we use the SEQUEST system 
because it is currently the most highly automated pro- 
gram and has proven to be successful, versatile and 
robust. 

33 Protein identification by LC-MS/MS, capillary 
LC-MS/MS and CE-MS/MS 

It has been demonstrated repeatedly that MS has a very 
high intrinsic sensitivity. For the routine analysis of gel- 
separated proteins at high sensitivity, the most signif- 
icant challenge is the handling of small amounts of 
sample. The crux of the problem is the extraction and 
transferal of peptide mixtures generated by the digestion 
of low nanogram amounts of protein, from gels into the 
MS/MS system without significant loss of sample or 
introduction of unwanted contaminants. We employ 
three different systems for introducing gel-purified sam- 
ples into an MS, depending on the level of sensitivity 
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required. As an approximate guideline, for samples con- 
taining tens of picomoles of peptides, LC-MS/MS is 
most appropriate; for samples containing low picomole 
amounts to high femtomole amounts we use capillary 
LC-MS/MS; and for samples containing femtomoles or 
less, CE-MS/MS is the method of choice. 

3.3.1 LC-MS/MS 

The coupling of an MS to an HPLC system using a 
0.5 mm diameter or bigger reverse phase (KP) column 
has been described in detail [42]. This system has several 
advantages if a large number of samples are to be ana- 
lyzed and all are available in sufficient quantity. The 
LC-MS and database searching program can be run in a 
fully automated mode using an autosampler, thus maxi- 
mizing sample throughput and minimizing the need for 
operator interference. The relatively large column is 
tolerant of high levels of impurities from either gel prep- 
aration or sample matrix. Lastly, if configured with a 
flow-splitter and micro-sprayer [40], analyses can be per- 
formed on a small fraction of the sample (less than 5%) 
while the remainder of the sample is recovered in very 
pure solvents. This latter feature is particularly useful 
when an orthogonal technique is also used to analyze 
peptide fractions, such as scintillation of an introduced 
radiolabel, and this data can be correlated with peptides 
identified by CID spectra. 

33.2 Capillary LC-MS • 

An increase of sensitivity of approximately tenfold can be 
achieved by using a capillary LC system with a 100 urn ID 
column rather than a 0.5 mm ID column as referred to 
above. Since very low flow rates are required for such 
columns, most reports have used a precolumn flow split- 
ting system for producing solvent gradients. We have 
recently desribed the design and construction of a novel 
gradient mixing system which enables the formation 
of reproducible gradients at very low flow rates (low 
nL/min) without the need for flow splitting (A. Ducret 
et «/., submitted for publication). Using this capillary 
LC-MS/MS system we were able to identify gel-separat- 
ed proteins if low picomole to high femtomole amounts 
were loaded onto the gel [40]. This system is as yet not 
automated and, like all capillary LC systems, is prone to 
blockage of the columns by microparticulates when ana- 
lyzing gel-separated proteins. 

333 CE-MS/MS 

The highest level of sensitivity for analyzing gel-sep- 
arated proteins can be achieved by using capillary elec- 
trophoresis - mass spectrometry (CE-MS). We have de- 
scribed in the past a solid-phase extraction capillary elec- 
trophoresis (SPE-CE) system which was used with triple 
quadrupoie and ion trap ESI-MS/MS systems for the 
identification of proteins at the low femtomole to sub- 
femtomole sensitivity level [43, 44]. While this system is 
highly sensitive, its operation is labor-intensive and its 
operation has not been automated. In order to devise an 
analytical system with both the sensitivity of a CE and 
the level of automation of LC, we have constructed 
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Figure 3. Schematic illustration of a 
microfabricated analytical system for CE, 
consisting of a rnicromachined device, 
coated capillary electfoosmotic pump, 
and microelectrospray interface. The 
dimensions of the channels and reservoir 
are as indicated in the text. The channels 
on the device were graphically enhanced 
to make them more visible. Reproduced 
from (45], with permission. 



microfabricated devices for the introduction of samples 
into ESI-MS for high-sensitivity peptide analysis. 

The basic device is a piece of glass into which channels 
of 10-30 |im in depth and 50-70 um in diameter are 
etched by using photolithography/etching techniques 
similar to the ones used in the semiconductor industry. 
(A simple device is shown in Fig. 3). The channels are 
connected to an external high voltage power supply [45]. 
Samples are manipulated on the device and off the 
device to the MS by applying different potentials to the 
reservoirs. This creates a solvent flow by electroosmotic 
pumping which can be redirected by changing the posi- 
tion of the electrode. Therefore, without the need for 
valves or gates and without any external pumping, the 
flow can be redirected by simply switching the position 
of the electrodes on the device. The direction and rate of 
the flow can be modulated by the size and the polarity 
of the electric field applied and also by the charge state 
of the surface. 

The type of data generated by the system is illustrated in 
Fig. 4, which shows the mass spectrum of a peptide sample 
representing the tryptic digest of carbonic anhydrase at 
290 ftnol/uL. Each numbered peak indicates a peptide suc- 
cessfully identified as being derived from carbonic an- 



hydrase. Some of the unassigned signals may be chemical 
or peptide contaminants. The MS is programmed to auto- 
matically select each peak and subject the peptide to CID. 
The resulting CID spectra are then used to identify the 
protein by correlation with sequence databases. Therefore, 
this system allows us to concurrently apply a number of 
protein digests onto the device, to sequentially mobilize 
the simples, to automatically generate CID spectra of 
selected peptide ions and to search sequence databases 
for protein identification. These steps are performed auto- 
matically without the need for user input and proteins can 
be identified at very low femtomole level sensitivity at a 
rate of approximately one protein per 15 min. 

3.4 Assessment of 2-DE-MS proteome technology 

Using a combination of the analytical techniques de- 
scribed above we have identified the 80 protein spots 
indicated in Fig. 5. The protein pattern was generated by 
separating a total of 40 microgram of protein contained 
in a total cell lysate of the yeast strain YPH499 by high 
resolution 2-DE and silver staining of the separated pro- 
teins. To estimate how far this type of proteome analysis 
can penetrate towards the identification of low abun- 
dance proteins, we have calculated the codon bias of the 
genes encoding the respective proteins.iCodon bias is a 




figure 4. MS spectrum of a tryptic digest 
of carbonic anhydrase using the microfa- 
bricated system shown in Fig. 3. 290 
fmol/uL of . carbonic anhydrase tryptic 
digest was infused into a Finnigan LC.Q 
ion trap MS. Each peak was selected for 
CID, and those which were identified as 
containing peptides derived from car- 
bonic anhydrase are numbered. Repro- 
' duced from (45), with permission. 
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Figure 5. 2-DE separation 
pH 3—10, and the second dimension 
procedures are included in S. P. Gygi et al. (submitted). 



calculated measure of the degree of redundancy of trip- 
let DNA codons used to produce each amino acid in a 
particular gene sequence. It has been shown to be a 
useful indicator of the level of the protein product of a 
particular gene sequence present in a cell [46]. The gen- 
eral rule which applies is that the higher the value of the 
codon bias calculated for a gene, the more abundant the 
protein product of that gene becomes. The calculated 
codon bias values corresponding to the proteins identi- 
fied in Fig. 5 are shown in Fig. 6b. Nearly all of the pro- 
teins identified f> 95%) have codon bias values of > 0.2, 
indicating they are highly abundant in cells. In contrast, 
codon bias values calculated for the entire yeast genome 
(Fig. 6a) show that the majority of proteins present in 
the proteome have a codon bias of < 0.2 and are thus of 
low abundance. 

This finding is of considerable importance in our assess- 
ment of the current status, of proteome analysis technol- 
ogy. It is clear that even using highly sensitive analytical 
techniques, we are only able to visualize and identify the 



more abundant proteins. Since many important regula- 
tory proteins are present only at low abundance, these 
would not be amenable to analysis using such tech- 
niques. This situation would be exacerbated in the anal- 
ysis of proteomes containing many more proteins than 
the approximately 6000 gene products present in yeast 
cells [16]. In the analysis of, for example, the proteome 
of any human cells, there are potentially 50000-100000 
gene products [47]. Inherent limitations on the amount 
of protein that can be loaded on 2-DE, and the number 
of components that can be resolved, indicate that only 
the most highly abundant fraction of the many gene 
products could be successfully analyzed. One approach 
that has been employed to circumvent these limitations 
is the use of very narrow range immobilized pH gradient 
strips for the first-dimension separation of 2-DE [48]. 
Since only those proteins which focus within the narrow 
range will enter the second dimension of separation, a 
much higher sample loading within the desired range is 
possible. This, in turn, can lead to the visualization and 
identification of less abundant proteins. 
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Figure 6. Calculated codoo bias values for yeast proteins. (A) Distribu- 
tion of calculated values for the entire yeast proteome. (B) Distribu- 
tion of calculated values for the subset of 80 identified proteins also 
shown in Figs. 1 and 5; Further details of experimental procedures are 
included in S. P. Gygi tt al. (submitted).. 



4 Utility of proteome analysis for biological 
research 

For the success of proteomics as a. mainstream approach 
to the analysis of biological systems it is essential to 
define how proteome analysis and biological research 
projects intersect. Without a clear plan for the implemen- 
tation of proteome-type approaches into biological re- 
search projects the full impact of the technology can not 
be realized. The literature indicates that proteome anal- 
ysis is used both as a database/data archive, and as a bio- 
logical assay or biological research tool. 

4.1 The proteome as a database 

The use of proteomics as a database or data archive 
essentially entails an attempt to identify all the proteins 
in a cell or species and to annotate each protein with the 
known biological information that is relevant for each 
protein. The level of annotation can, of course, be exten- 
sive.' The most common implementation of this idea is 
the separation of proteins by high resolution 2-DE, the 
identification of each detected protein spot and the 
annotation of the protein spots in a 2-DE gel database 
format. This approach is complicated by the fact that it is 
difficult to precisely define a proteome and to decide 
which proteome should be represented in the database. 
In contrast to the genome of a species, which is essen- 
tially static, the proteome is highly dynamic. Processes 
such as differentiation, cell activation and disease can all 
significantly change the proteome of a species. This is 
illustrated in Fig. 7. The figure shows two high-resolu- 



tion 2-DE maps of proteins isolated from rat serum. 
Fig. 7A is from the serum of normal rats, while Fig. 7B 
is from the serum of rats in acute-phase serum after 
prior treatment with an inflammation-causing agent [49]. 
It is obvious that the protein patterns are significantly 
different in several areas, raising the question of exactly 
which proteome is being described. 

Therefore, a comprehensive proteome database of a spe- 
cies or cell type needs to contain all of the parameters 
which describe the state and the type of the cells from 
which the proteins were extracted as well as the software 
tools to search the database with queries which reflect 
the dynamics of biological systems. A comprehensive 
proteome database should be capable of quantitatively 
describing the fate of each protein if specific systems 
and pathways are activated in the cell. Specifically, the 
quantity, the degree of modification, the subcellular loca- 
tion and the nature of molecules specifically interacting 
with a protein as well as the rate of change of these 
variables should be described. Using these admittedly 
stringent criteria, there is currently no comlete proteome 
database. A number of such databases are, however, in 
the process of being constructed. The most advanced 
among them, in our opinion, are the yeast protein data- 
base YPD [50] (accessible at http://www.ypd.com) and 
the human 2D-PAGE databases of the Danish Centre 
for Human Genome Research [12] (accessible at http:// 
biobase.dk/cgi-bin/cehs). While neither can be con- 
sidered complete as not all of the potential gene pro- 
ducts are identified, both contain extensive annotation 
of supplemental information for many of the spots 
which aire positively identified in reference samples. 

4.2 The proteome as a biological assay 

The use of proteome analysis as a biological assay or 
research tool represents an alternative approach to inte- 
grating biology with proteomics. lb investigate the state 
of a system, samples are subjected to a specific proceess 
that allows the quantitative or qualitative measurement 
of some of the variables which describe the system. In 
typical biochemical assays one variable (e.g., enzyme 
activity) of a single component (e\g.,'a particular en- 
zyme) is measured. Using proteomics as an assay, mul- 
tiple variables (e.g., expression level, rate of synthesis, 
phosphorylation state, etc.) are measured concurrently 
on many (ideally all) of the proteins in a sample. The 
use of proteomics as an assay is a less far-reaching prop- 
osition than the construction of a comprehensive pro- 
teome database. It does, however, represent a pragmatic 
approach which can be adapted to investigate specific 
systems and pathways, as long as the interpretation of 
the results takes into account that with current technol- 
ogy not all of the variables which describe the system 
can be observed (see Section 3.4). 

A common implementation of proteome analysis as a 
biological assay is when a 2-DE protein pattern gener- 
ated from the analysis of an experimental sample is 
compared to an array of reference patterns representing 
different states of the system under investigation. The 
state of the experimental system at the time the sample 
was generated is therefore determined by the quantita- 
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alized, although the results become more informative as 
more proteins are compared. It is obvious, however, that 
the possibility to identify any protein deemed character- 
istic for a particular state dramatically enhances this 
approach by opening up new avenues for experimenta- 
tion. 




Pi 




Figure 7. High resolution 2-DE map of proteins isolated from rat serum with ot without prior exposure to an inflam- 
mation-causing agent. (A) normal rat serum, (B) acute-phase serum from rats which had previously been exposed to 
an inflammation-causing agent. The first dimension of separation is an IPG from pH 4-10, and the second dimen- 
sion is a 7.5-17.5%T gradient SDS-PAOE gel. Proteins were visualized by staining with amido black. Further details 
of experimental procedures are included in (14, 49]. 




tive comparative analysis of hundreds to a few thousand 
nroteins Comparative analysis of the 2-DE patterns fur- 
thermore highlights quantitative and qualitative differ- 
ences in the protein, profiles which correlate with the 
state of the system. For this type of analysis it is not 
essential that all the proteins are identified or even visu- 
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Proteome analysis as a biological assay has been success- 
fully used in the field of toxicology, to characterize 
disease states or to study differential activation of cells. 
The approach is limited, of course, by the fact that only 
ihe visible protein spots are included in the assay, and it 
is well known that a substantial but far from complete 
fraction of cellular proteins are detected if a total cell 
lysate is separated by 2-DE. Proteins may not be 
detected in 2-DE gels because they are not abundant 
enough to be visualized by the detection method used, 
because they do not migrate within the boundaries (size, 
pi) resolved by the gel, because they are not soluble 
under the conditions used, or for other reasons. 

A different way to use proteome analysis as a biological 
assay to define the state of a biological system is to take 
advantage of the wealth of information contained in 
2-DE protein patterns. 2-DE is referred to as two-dimen- 
sional because of the electrophoretic mobility and the 
isoelectric points which define the position of each pro- 
tein in a 2-DE pattern. In addition to the two dimen- 
sions used to generate the protein patterns, a number of 
additional data dimensions are contained in the protein 
patterns. Some of these dimensions such as protein 
expression level, phosphorylation state, subcellular loca- 
tion, association with other proteins, rate of synthesis or 
degradation indicate the activity state of a protein or a 
biological system. Comparative analysis of 2-DE protein 
patterns representing different states is therefore ideally 
suited for the detection, identification and analysis of 
suitable markers. Once again it must be emphasized that 
in this type of experiment only a fraction of the cellular 
proteins is analyzed. Since many regulatory proteins are 
of low abundance, this limitation is a concern, particu- 
larly in cases in which regulatory pathways are being 
investigated. 

5 Concluding remarks 

In this report we have addressed three main issues 
related to proteome analysis. First, we have discussed 
the rationale for studying proteomes. Second, we have 
assessed the technical feasibility of analyzing proteomes 
and described current proteome technology, and third, 
we have analyzed the utility of proteome analysis for bio- 
logical research. It is apparent that proteome analysis is 
an essential tool in the analysis of biological systems. 
The multi-level control of protein synthesis and degrada- 
tion in cells means that only the direct analysis of 
mature protein products can reveal their correct identi- 
ties, their relevant state of modification and/or associa- 
tion and their amounts. Recently developed methods 
have enabled the identification of proteins at ever- 
increasing sensitivity levels and at a high level of auto- 
mation of the analytical processes; A number of tech- 
nical challenges, however, remain. While it is currently 
possible to identify essentially any protein spots that can 
be visualized by common staining methods, it is ap- 
parent that without prior enrichment only a relatively 
small and highly selected population of long-lived, 
highly expressed proteins is observed. There are many 
more proteins in a given cell which are not visualized by 
such methods. Frequently it is the low abundance pro- 
teins that execute key regulatory functions. 
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We have outlined the two principal ways proteome anal- 
ysis is currently being used to intersect with biological 
research projects: the proteome as a database or data 
archive and proteome analysis as a biological assay. Both 
approaches have in common that at present they are con- 
ceptually and technically limited. Current proteome data- 
bases typically are limited to one cell type and one state 
of a cell and therefore do not account for the dynamics 
of biological systems. The use of proteome analysis as a 
biological assay can provide a wealth of information, but 
it is limited to the proteins detected and is therefore not 
truly proteome-wide. These limitations in proteomics are 
to a large extent a. reflection of the fact that proteins in 
their fully processed form cannot easily be amplified and 
are therefore difficult to isolate in amounts sufficient for 
analysis or experimentation. The fact that to date no 
complete proteome has been described further attests to 
these difficulties. With continued rapid progress in pro- 
tein analysis technology, however, we anticipate that the 
goal of complete proteome analysis will eventually 
become attainable. 
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Introduction 

At Its current pace, the accumulation of biomedical literature 
outpaces the ability of most researchers and clinicians to stay 
abreast of their own Immediate fields, let alone cover a broader 
range of topics. For example, to follow a single disease, e.g.. 
breast cancer, a researcher would have had to scan 130 different 
Journals and read 27 papers per day In 1999.' This problem Is 
accentuated with high- throughput technologies such as DNA 
micro-arrays and proteomlcs. which require the analysis of 
large datasets Involving thousands of genes, many of which are 
unfamiliar to a particular researcher. In any mlcroarray experi- 
ment, thousands of genes may demonstrate statistically sig- 
nificant expression changes, but only a fraction of these may 
be relevant to the study. The ability to Interpret these datasets 
would be enhanced If they could be compared to a compre- 
hensive summary of what is known about all genes. Thus, there 
Is a need to summarize existing knowledge In a format that 
allows for the rapid analysis or associations between genes and 
diseases or other specific biological concepts. 

One solution to this problem Is to compile structured digital 
resources, such as the Breast Cancer Gene Database' and the 
Tumor Cene Database. 1 However, as these resources are hand- 
curated, the labor-intensive review process becomes a rate- 
Hmltlng step In the growth of the database As a result, these 
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databases have a limited scale and the genes are not selected 
In a systematic fashion. -«w*«u 

An alternative approach is automated text mining: a method 
which Involves automated Information extraction by searchlra 
documents for text strings and analyzing their frequency and 
context This approach has been used successfully In several 
Instances for biological applications, in most cases. It has been 
applied to extract Information about the relationships or 
nteractlons that proteins or genes have with one another. In 
the literature or by functional annotation.'-' Thus far few 
publication have applied text-mining to examine the global 
relationships between genes and diseases. Perez-Iratxetaet al 
automatical^ examined the CO (Gene Ontology) annotation 
or genes and their predicted chromosomal locations in order 
to Identify genes linked to Inherited disorders.* 

To obtain a more global understanding of disease develop- 
ment, u would be valuable to Incorporate Information regarding 
all possible gene-disease relationships. Including biochemical 
Physiological pharmacological, epidemiological, as well as 
genetic. This Information would enable comprehensive com- 
parisons between large experimental datasets and existing 
knowledge In the literature. This would accomplish two things! 
Hrsl It would serve to validate experiments by demonstrating 
,iTu, r ? p0nSCS OCCUr 85 Predicted. Second, It would 
rapidly highlight which genes are corroborated by the literature 
and which genes are novel In a given context. We have utilized 
a computational approach to literature mining to produce a 
Journal of Pnxtoma Research 20Qi Z 405-412 403 
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comprehensive set or gene-disease relationships. In addition, 
we have developed a novel approach to assess the strength of 
each association based on the frequency or citation and co- 
citatlon. We applied this tool to help Interpret the data from a 
large micro-array gene expression experiment comparing 
normal and cancerous breast tissue. 



Methods 

MedGene Database. MedCene Is a relational database, stor- 
ing disease and gene Information from NCBI. text mining re- 
sults, statistical scores, and hyperlinks to the primary lit- 
erature. MedGene has a web-based user- interface for users to 
query the database (http://hlpseq.med.harvard.edu/MedCene/). 

Text Mining Algorithms. MeSH files were downloaded from 
the MeSH web site at NLM (Nation Library of Medicine) (http:// 
www.nim.nlh.gov/mesh/meshhome.htmO and human disease 
categories were selected. LocusLInk flies were downloaded from 
the LocusLInk web site at NCBI (http://www.ncbl.nlh.gov/ 
LocusLInk/). Official/preferred gene symbol, official/preferred 
gene name, and gene alternative symbols and names, all 
relevant annotations and URLs for each LocusLInk record, were 
collected. Cene search terms were used for literature searching 
and Included all qualified gene names, gene symbols, and gene 
family terms. Primary gene keys, predominantly qualified gene 
family terms and gene official/preferred symbols, were used 
to Index Medline records, If the official/preferred gene symbols 
did not meet the standards to be an Index, then qualified gene 
official/preferred names were used. A local copy of Medline 
records (up to July. 2002) was pre-selected. 

A JAVA module examined the MeSH terms and then Indexed 
each Medline record with the appropriate disease terras A 
separate JAVA module was used to examine the titles and 
abstracts for gene search terms and then to Index the gene- 
related Medline records with the relevant primary gene key(s). 

Statistical Methods. For every gene and disease pair we 
counted records that were Indexed for both gene and disease 
(double positive hits), for disease only (disease single hits) for 
gene only (gene single hits), and for neither gene nor disease 
(double negative hits) to generate a 2 x 2 contingency table 
On the basis of the contingency table-framework. we applied 
different statistical methods to estimate the strength of gene- 
disease relationships and evaluated the results. These methods 
Included chl-square analysis. Fisher's exact probabilities rela- 
tive risk of gene, and relative risk of disease'* (http// 
hlpseq.med.harvard.edu/MedCene/). In addition, we computed 
the product or frequency", which Is the product of the 
proportion or disease/gene double hits to disease single hits 
and the proportion of disease/gene double hits to gene single 
hits. To obtain a normal distribution, we transformed all the 
statistical scores using the natural logarithm. We selected the 
log or the product of frequency (LPF) to validate MedGene and 
to use for the analysis with the micro-array data. Spearman 
rank-correlation coefficients were used to assess the linear 
relationship between LPF and micro-array fold chanite in 
expression level. * 

Clobal Analysis. Diseases with at least 50 related genes were 
selected for clustering analysis, and the LPF scores were 
normalized with total score for each disease. Hierarchical 
clustering was done with the 'Cluster" software and the 
clustering result was visualized using TreeViewer" (http // 
rana.lbl.gov/ElsenSofrware.htm). v p 
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Breast Tissue Micro- Arrays. Eighty-nine breast cancer 
samples (79% ER-posltlve) and 7 normal breast tissue samples 
were selected from the Harvard Breast SPORE frozen tissue 
repository and were representative of the spectrum of histo- 
logical types. grades>and hormone receptor Immunc-pheno- 
types of breast cancer. Biotlnylated cRNA. generated from the 
total RNA extracted from the bulk tumor, was hybridized to 
Affymetrix U95A oligonucleotide micro-arrays. These micro- 
arrays consist of 12 400 probes, which represent approximately 
9000 genes. Raw expression values were obtained using GENE- 
CHIP software from Affymetrix, and then further analyzed using 
the DNA-Chlp Analyzer (dChlp) custom software. 

Results 

Automated Indexing of Medline Records by Disease and 
Gene. To study the gene-disease associations In the literature, 
we first complied complete lists for human diseases and human 
genes. To Index all Medline records that were relevant to 
human diseases, the Medical Subject Heading (MeSH) Index 
or Medline records was utilized. MeSH is a controlled medical 
vocabulary from the National Library of Medicine and consists 
of a set of terms or subject headings that are arranged In both 
an alphabetic and an hierarchical structure. Medline records 
are reviewed manually and MeSH terms are added to each with 
software assistance.™ Twenty-three human disease category 
headings along with all of their child terms (seethe Supporting 
Information. Supplemental Table 1, or visit http^/hlpseq 
med.harvard.edU/MedCene/publlcaOon/s Table l.htmD were 
selected from the 2002 MeSH Index creating a list of 4033 
human diseases. 

No Index comparable to the MeSH Index exists for genes 
and thus. It was necessary to apply a string search algorithm 
for gene names or symbols found in Medline text. A complete 
list of genes, gene names, gene symbols, and frequently used 

MrmH 5 C0llected fiom ,he i^usLlnV database at 
NCBI,» » which contains 53 259 Independent records keyed 
by an official gene symbol or name (June 18*. 2002). For the 
purposes of this study, no distinction was made between genes 
and their gene products. Authors often use the same name for 
both, differentiating the two only by the use of Italics. If at all 
For the Intended use of this study, this lack of distinction is 
unlikely to have a large effect and may In fact be beneficial. 
Initial attempts to search the literature using these lists 

rTS^Tf 1 S0UrCCS ° f fabe P° s,Uves and to'* negatives 
(Table 1). False positives primarily arose when the searched 
term had other meanings, whereas false negatives arose from 
syntax discrepancies necessitating the development or niters 
to reduce these errors. The syntax Issues were readily handled 
by Including alternate syntax forms In the search terms The 
raise positive cases, caused by duplicative and unrelated 
meanings for the terms, were more difficult to manage. Where 
possible, case sensitive string mapping reduced Inappropriate 
citations. In many cases, however, this was not sufficient and 
the terms had to be eliminated entirely, thereby reducing the 
genL P0S,MVe ra,e unaVo,dab * "nder-representlng some 

For the purposes or data tracking, a primary gene key was 
selected to represent all synonyms that correspond to each 
gene. Medline records were Indexed with a primary gene key 
when any synonym for that key was found In the title or 
abstract. Case-insensitive string mapping was used for all 
searches except as noted above. No additional weight was 



Analysis' of Data Using Advanced Literatim Mining 

Table 1. Systematic Sources of False Positives and False Negatives In Unfiltered Data* 



research articles 



error type 



gene symbol/name false positive 

Is not unique 



gene symbol Is false positive 

unrelated abbreviation 

gene symbol/name false positive 

has language meaning 

nonstandard syntax false negative 

unofficial gene name/symbol false negative 

nonspeclfled gene name false negative 



example 



Out solution 



AM C- myelin ™ • 

associated glycoprotein 
MA C-rnal Ignancy-assoclated 

protein 

PA— pallid homologue (mouse), 

pallldin (also abbrev. for Pennsylvania) 
MMS-WUkoU-Aldrlch Syndrome 

(also the word 'was'} 
BAG-1 Instead of BAC1 
P53 instead of TP53 
estrogen receptor Instead of 
Estrogen receptor 1 



eliminate this term 

eliminate this term 

case-sensitive string search 

add dash term 

add all gene nicknames 

add family stem term 



iake negatives are real relationship that are undammJrtednM BtoXEr!^ J^^Zi K".^ V W88eited rel ""»»Np» that are rw real and 
error.Ingenena error rate. mtSlstd s«5*^^ 



added for multiple occurrences of a term or the co-occurrence 
of multiple synonyms for (he same gene key. 

Medline records were searched with all qualified gene 
Identifiers, such as the official/preferred gene symbol, the 
official/preferred gene name, all gene nicknames and all syntax 
variants. In situations where there are several members of a 
gene family or splice variants, some authors prefer to use a 
shortened gene family name, e.g., estrogen receptor.lnstead of 
estrogen receptor 1 (£»?/). creating a source or false negatives. 
For this reason, gone ramify stem terms were created for all 
genes that have an alpha or numerical suffix (eg, IL2RA, TCFp, 
ESRl. etc.) and then used to search the literature. The family 
stem terms were handled separately from the specific gene 
names so that It would be clear when linkages were made to 
the gene family versus a specific member In that family. 

To Improve performance and accuracy, some pre-selectlon 
was applied to the records that were scanned, First, review 
articles were eliminated to avoid redundant treatment or • 
citations. Second. non-English Journals were removed because 
the natural language filters were only relevant to English 
publications. Flnally.Journals unlikely to contain primary data 
about gene-disease relationships were also removed (e.g., Int. 
J. Health Educ. Bedside Nurse, and / Health Earn). Together 
these filters reduced the 12 198 221 Medline publications flulv 
2002) by 37%. 7 
Ranking the Relative Strengths of Gene-Disease Associa- 
tions. In total, there were 618 708 gene-disease co-citations. 
In which 16% (8297) of all studied genes had been associated 
to a disease and 96% (3875) of all diseases had been associated 
to at least one gene. To rank the relative strengths of gene 
disease relationships, we tested several different statistical 
methods and examined the results. With the exception of the 
relative risk estimates, the methods provided similar results 
with respect to the rank order or the gene-disease association 
strengths. However, after comparing the results to other 
databases and after consulting disease experts, the log of the 
produa or frequency (LPF) was selected for further analysis 
because It gave the best results overall. 

Validation of MedGene. In developing this tool, It was 
Important to minimize the number or missed genes (false 
negatives) and miscalled genes (raise positives). However. In 
situations when these goals were In conflict. Incluslveness was 
prioritized. To determine the false negative rate In MedCene 
breast cancer was used as a test case because It was associated 
with more genes than any other human disease and because 




Figure i. Estimation of the raise negative rate by comparison 
with hand-curated databases. The breast cancellated ganw 
Identified by MedGene were compared with those listed In 
several other databases including the Tumor Gene Database 

S a nH% B ^ e8St f«? r 6 * TO ^^(BCG),' GenaCards 
(GC)» and Swlssprot" Genes were considered false negatives 
If they were represented in at least one of these other databases 
end not In MedGene and their link to breast cancer was sup- 
ported by at least one literature reference. All literature references 
were verified by manual review to confirm their validity The 
number of genes In each database or shared by more than one 
database is indicated. The false negative rate was calculated bv 
genes missed at MedGene (26)ftotal number of nonovertapplnq 
genes in other databases (28$). 

there were several public databases that link genes to breast 
cancer. We compared the list or breast cancer-related genes 
from MedGene to these databases. Illustrated In Figure 1. 
Among the 285 distinct breast cancer-related genes that were 
supported by at least one literature citation In these hand- 
curated databases, 26 were absent from MedGene, suggesting 
a false negative rate of approximately 9%. To determine why 
these were missed, all literature references for these genes (80 



JoiiriBloll^BOiiieResMrxli.VoUNafZOoj *J7 



Tesearch articles ; 

• papers) were reviewed manually (see the Supporting Informa- 
tion, Supplemental Tabic 2. or visit http://hlpseq.med. 
harvard.edu/MedGene/publlcatlon/s_TaWe 2.htmD. Among 
these papers, most false negatives were caused by nonstandard 
gene terms or gene terms eliminated by our specificity filters. 
Few genes were missed because they were only mentioned in 
review papers (0.4%) or they appeared only In the body of the 
manuscript but riot the abstract or title (1.1%). Of note, 
MedCene Identified approximately 2000 additional breast 
cancer-related genes not listed In any other database. 

To assess the false positive error rate, two complementary 
approaches were used: a detailed analysis of one disease and 
a global examination of 1000 diseases. The detailed approach 
examined the false positive error rate and Its sources, whereas 

the global approach-tested whether the overall results made 
biomedical sense. 

Using the LPF, 1467 genes related to prostate cancer were 
assembled In rank order. We then retrieved approximately 300 
Medline records each for the highest ranked 100 and the lowest 
ranked 200 genes and manually reviewed the titles and 
abstracts to determine the verity of the association. Nearly 80% 
of the highest ranked 100 genes fell Into one" of the live 
categories that reflect meaningful gene-disease relationships 
(see the Supporting Information, Supplemental Table 3. or visit 

http://hlpseq.med.harvard.edu/MedGene/publlcatlon/ 
s.Table 3.html). Among the lowest ranked 200 genes, ap- 
proximately 70% reflected true relationships. Of the 600 records 
reviewed, there were only two In which the association between 
the gene and the disease was described as negative. Both were 
genes with very low scores. In both cases, the authors did not 
argue the absence oT any relationship, but rather that a 
particular feature or the gone or protein was not shown to be 
related to human prostate cancer. 1 *'* 

The coincidence of some gene symbols with medical ab- 
breviations, chemical abbreviations and biological abbrevia- 
tions resulted In most of the false positives (see the Supporting 
Information. Supplemental Table 4, or visit http://hipse- 
q.med.harvard.edu/MedCene/publlcatlon/s_Table 4.html), em- 
phasizing the Importance or the filters that were added In the 
search algorithm (Table 1). Without the filters, the false positive 
rale more than doubled, and the false negative rate rose 
dramatically (data not shown). For example, among Ihe papers 
about breast cancer, there were only 12 Medline records that 
referred to ESRl and 10 to ESR2, whereas almost 2000 papers 
mentioned estrogen receptor without specifying ESRl or £$/_» 
this latter group was detected by the family stem term niter. 

To further validate these results, a global analysis of the gene- 
disease relationships described by MedCene was performed. 
For this experiment, it was reasoned that the more closely 
related (he diseases are to one another, the more they will be 
related to the same gene sets. Thus. If the relationships defined 
by MedCene accurately reflected the literature, then an unsu- 
pervised hierarchical clustering or the gene data should group 
diseases In a manner consistent with common medical think- 
ing Conversely, if the clustered diseases do not make sense 
biologically or medically, it may reflect excessive raise positives, 
false negatives, or Inappropriate scoring or the data. 

To execute this experiment, the gene sets and the corre- 
sponding LPF values for 1000 randomly selected diseases (each 
with at least 50 gene relationships) were used as a dataset for. 
clustering ihe diseases. A review orthe results showed that the 
resulting disease clusters were Indeed logical based upon 
common medical knowledge (see the Supporting Information. 
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Supplemental Figure 1. or visit http://hlpseq.med.hBrvard.edu/ 
MedGene/publicatlon/s_Figure l.html). For example. In one 
such cluster shown In Figure 2. diabetes and Its complications 
grouped together and were also closely linked to diseases 
associated with starvation states. 

The number of genes associated with a given disease can 
be estimated by adjusting the MedCene number up by the false 
negative rate f>9%) and down by the false positive rate (~26% 
on average). Using this, the average disease has 103.7 ± 45.3 
(mean ± s.d.) genes associated with It. although the range Is 
quite broad with 2359 genes related to breast cancer, 2122 
genes related to lung cancer and no genes related to a number 
of diseases. 

Applying MedCene to the Analysis of Large Dataseu. Access 
to a comprehensive summary of the genes linked to human 
diseases provided an opportunity to analyze data obtained from 
a high-throughput experiment. We compared the MedCene 
breast cancer gene list to a gene expression data set generated 
from a micro-array analysis comparing breast cancer and 
normal breast tissue samples. Micro-array analysis identified 
2286 genes that had greater than a I -fold difference In mean 
expression level between breast cancer samples and normal 
breast samples. Using MedCene. we sorted the 2286 genes Into 
four classes: 555 genes directly linked to breast cancer in the 
literature by gene term search (first-degree association by gene 
name); 328 genes directly linked by family term search (first- 
degree association by family term); 1021 genes linked to breast 
cancer only through other breast cancer genes (second-degree 
association); and 505 genes not previously associated with 
breast cancer. (See the Supporting Information. Supplemental 
Figure 2. or visit http://hlpseq.med.hafvard.edu/MedGene/ 
publlcallon/s_Flgure 2.html.) Among the 505 previously un- 
related genes. 467 were either newly Identified genes or genes 
that had not previously been associated with any disease. 
Among the remaining 38 genes. 0 had been related to other 
cancers, specifically esophageal, colon, uterine, skin, and cervix. 
• To determine whether the genes highlighted by the micro- 
array analysis were more likely to have been previously linked 
to breast cancer In the literature, we created a two-dimensional 
plot of the fold change of expression level between breast 
cancer and normal tissue versus the literature score (LPF) 
(Figure 3A). There was a broad spread of expression changes 
among the genes directly linked to breast cancer ranging from 
less than 1-fold change (68%) to over 40-fold (0.3%). Notably 
the majority or genes with greater than 10-fold expression 
changes were linked to breast cancer by first-degree associa- 
tion. 

Among all 754 genes directly linked to breast cancer In the 
literature, there was no correlation between LPF and micro- 
array fold change (r <= 0.018. p-value - 0.62). However, when 
we stratified the analysis based on the magnitude or the fold 
change, we observed an Increasing trend in correlation (Figure 
3B) suggesting that genes with a more substantial change In 
expression level were more likely to have a stronger association 
In the literature For genes that had 1 0-fold change or more in 
expression level, the correlation Increased to 0.41 (p-value - 
0.05). ^ 

When we evaluated the micro-array data separately for ER 
positive and ER negative tumors, the trend In correlation 
between fold change and literature score was highly dependent 
on estrogen receptor status. Interestingly, there was a similar 
trend In correlation for ER positive tumors, but no trend In 
correlation for ER negative tumors. 
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Coxsackievirus Infections 
Obesity la Diabetes 
Diabetic Ketoacidosis 
Glucose Intolerance 

Diabetes Mellltus, Hon- Insulin- Dependent 
Diabetes Mellltus, Insulin-Dependent 
Pregnancy in Diabetics 
Diabetic Retinopathy 
Diabetic Angiopathies 
Diabetic Neuropathies 
glycosuria 
Byperinsullnlsa 
Hyperlnsulincmla 
Hypoglycemia 
Hyperglycemia 
Diabetes Mellltus, 
Diabetes Mellltus 
Diabetes, Gestational 
""WfarvaTfon""""" — — — — 
Jaundice, Neonatal 
Brain Bdena 
Pulmonary Bdsa&a 
Nutrition Disorders 
Kwashiorkor 
Critical Illness 
Burns 

Diabetic Nephropathies 
Albuminuria 
Insulinoma 



Experimental 




finally, to validate our findings, we computed similar cor- 
relations between the breast cancer expression data and 
LPF scores generated by MedGene for hypertension, a 



disease unrelated to breast cancer. As expected, we did not 
observe an Increasing trend In correlation for hyperten- 
sion. 
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Table 2. Top 25 Genes Related to Selected Human Diseases* 



research articles 







rheumatoid arthritis 


bipolar dbordar 


estrogen receptor 
PGR 

ERBB2 • 
BRCA1 
BRCA2 
EG1H 
CYP19 
TFFl 
PSEN2 
TP53 


REN 
DBP 
IEP 

ACT 
INS 

kalllkreln 
ACE 

endothelln 
S100A6 
BDK 


RA 

TNFRSFIOA 
CRT 
AS 
ESR1 

HLA-DRBl 
DRI 

lnterieukln 
TNF 
R6 


ERDAI 
SNAP29 
PFKL 
DRD2 
TRH 
IMPA2 
HTRSA 
DRD3 
REM 
KCNN3 


CESS 
CEACAM5 


DlANPH 
SARI 


collagen 
ILIA 


DRD4 
HTR2C 


ERBB3 — 


m 


ACR --• - 

TNFRSFI2 

BJ 

CHJ3L1 
US 

mwneuiun i 
matrix 

metalloprotelnase 

Interferon 

CDS8 

IL4 
HI? 


REIN 

DBH 

MAOA 

HTRZA 

SYNJI 


cydin 
COXSA 
cathepsin 
ER6B4 

TRAM 


CD59 
ALB 

CYPUB2 
MAT2B 
angiotensin 
receptor 


CCND1 

ECF 

MUCt 

Insulln-IIke 
BCU 


AGTR2 

NPPA 

LVM 

DBH 
NPY 


INPPI 

NEDD4L 

FRAI3C 

transducer or 

ERBB2 

BAIAP3 


mucin 
FGF3 


POMC 
neuropeptide 


MMP3 
SO. 


ATP1B3 
DRDS 



athenackerosb 



apollpoproteln 
APOE 
LDLR 
ELN 
ARGl 
APOB 
AP0A1 
MSRI 
tPl 
PONI 

plasminogen 
activator inhibitor 
PLC 

vascular cell 
adhesion molecule 
AT0H1 
VWF 
INS 
ARC2 

ABCAI 

OLRI 
collagen 
MCP 

lipoprotein 
APOA2 
Intercellular 
adhesion molecule 
RAB27A 



Discussion 

The Human Genome Project heralded a new era In biological 
research where the emphasis on understanding specific path- 
ways has expanded to global studies of genomic organization 
and biological systems. High-throughput technologies can 
provide novel Insight Into comprehensive biological function 
but also Introduces new challenges. The utility of these 
technologies Is limited to the ability to generate, analyze, and 
Interpret large gene lists. MedGene. a relational database 
derived by mining the Information in Medline, was created to 
address this need. MedGene users can query for a rank-ordered 
list of human gene-disease relationships (Table 2) for one or 
more diseases. Each entry Is hyperllnked to the original papers 
supporting each association and to other relevant databases. 

MedCene is an Innovative extension of previous text mining 
approaches. Perez-lratxeta et al. used the CO annotation and 
their chromosomal locations to predict genes that may con- 
tribute lo Inherited disorders.' MedGene takes a broader view 
and Includes all diseases and all possible gene-disease relation- 
ships. Furthermore, MedCene utilizes co-cltatlon to Indicate a 
relationship rather than CO annotation, which Is limited to the 
subset of genes that have GO annotation. Our approach Is 
complementary to thai taken by Chaussabel and Sher. who 
used the frequency of co-clted terms to cluster genes Into a 
hierarchy of gene-gene relationships.' 

A unique aspect of this tool Is the ability to assess the relative 
strengths of gene-disease relationships based on the frequency 
of both co-cltatlon and single citation. This presupposes that 
most co-cltatlons describe a positive association, often referred 
to as publication bias" and Is supported by our observations 



that negative associations are rare (Supplemental Table 3: 

http://hlpseq.med.harvard.edU/MedGene/publlcatlon/s Ta- 
ble 3.htmQ. Of course, relationships established by frequency 
of co-cltatlon do not necessarily represent a true biological link; 
however, It Is strong evidence to support a true relationship. 

Another Important feature of MedCene Is the Implementa- 
tion of software filters that substantially reduced the error rate. 
We estimate that less than 10% or all associations were missed 
and at least 70% of even the weakest associations were real. 
For this study, all of the filters that we applied were general 
ones. e.g.. expending the list of all gene names to address the 
different syntax rorms used by different Journals, eliminating 
gene names that correspond to common English words, etc. 
The majority of the remaining search term ambiguities were 
idiosyncratic and difficult to Identify systematically without 
causing a significant rise In false negatives. Alternative ap- 
proaches, such as the examination of the nearest neighbor 
terms, need to be considered to further reduce the false positive 
rate. r 

It Is not uncommon to see expression changes In micro- 
array experiments as small as 2-fold reported In the literature. 
Even when these expression changes are statistically significant, 
it is not always clear If they are biologically meaningful. When 
comparing expression levels or disease to normal tissue, one 
expects an enrichment or known disease-related genes to 
appear In the altered expression group. MedCene provided a 
unique opportunity to test this notion In the context or existing 
knowledge on a novel breast cancer micro-array dataset. For 
genes displaying a 5-fold change or less In tumors compared 
to normal, there was no evidence of a correlation between 
altered gene expression and a known role In the disease. This 
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Tabla 3. Genes with Large Expression Changes In ER- but 
Not In ER+ Breast Tumors 



gene symbol 


fold change (BR+) 


fold change (ER-) 


KRTHBI 


1.0 


610.8 


BRS3 


1.2 


89.4 


DKKI 


1.2 


69.8 


ZIC1 


1.9 


59.6 


TLR1 


1.0 


3S.5 


K1AAOS80 


2.6 


33.2 


CDKN3 


1.0 


30.6 


EBI2 


4.0 


27.9 


GZMB 


3.8 


21.9 


STK1S 


4.7 


18.6 


GPR49 


1.0 


14.6 


mow 


1.6 


14.4 


LAD] 


-1.0 


13.S 


POLES 


4.2 


13.0 


HMG4 


4.4 


12.9 


BCLZLll 


-1.2 


12.3 


LRP8 


2.9 


12.2 


CCNB2 


1.0 


11.8 


CCNE2 


4.0 


11.6 


KB 


-4.3 


11.1 


KNSL6 


2.9 


10.9 


HIF5 


3.0 


10.2 


SERPINHZ 


4.6 


10.2 


YAP1 


1.0 


10.0 


LPHB 


-1.3 


-10.4 


TCEA2 


-I.I 


-10.8 


TFF1 


1.3 


-11.4 


COL17AI 


-4.1 


-15.7 


POPS 


1.1 


-16.2 


BPACJ 


-4.6 


-22.3 


PDZKI 


-1.1 


-36.8 


VECFC 


-2.8 


-51.5 


MUC6 


-1.4 


-64.9 


SERP1NA5 


-1.0 


-83.1 


MEJS1 


-1.6. 


-85.9 


CA12 


2.4 


-150.3 



Table 3. MedCone identified a set of relatively understudied, yet highly 
upraised genes in ER negative, but not ER positive breast tumors. AU of 
these genes have either never been co-cited Willi breast cancer or have a 
weak association except those marited with an *. 



reflects the many genes whose role In breast cancer may not 
Involve large changes In expression In sporadic tumors (e.g., 
BRCA1 and BRCA2) and genes whose modest changes In 
expression may be unrelated to the disease. Strikingly, among 
genes with a 10-fold change or more In expression level, there 
was a strong and significant correlation between expression 
level and a published role In the disease, providing the first 
global validation of the micro-array approach to Identifying 
disease-specific genes. 

The results derived from MedCene have two Implications. 
First, a careful hunt for corroborating evidence of a role In 
breast cancer should precede any further study of genes with 
less than 5-fold expression level changes. Second, any genes 
with 10- fold changes or more arc likely to be related to breast 
cancer and warrant attention. It Is likely that this threshold will 
change depending on the disease as well as the experiment. 

Interestingly, the observed correlation was only found among 
ER-posltive tumors, not ER-negatlve. This may reflect a bias 
In the literature to study the more prevalent type of tumor In 
the population. Furthermore, this emphasizes that caution 
must be taken when Interpreting experiments that may contain 
subpopulatlons that behave very differently. The MedCene 
approach Identified a set of relatively understudied, yet highly 
expressed genes In ER-negatlve tumors that are worthy of 
further examination (Table 3). 
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In conclusion, we have developed an automated method of 
summarizing and organizing the vast biomedical literature. To 
our knowledge, the resulting database is the most comprehen- 
sive and accurate of Its kind. By generating a score that reflects 
the strength or the association. It provides an Important too) 
for the rapid and flexible analysis of large datasets from various 
high-throughput screening experiments. Furthermore, it can 
be used far selecting subsets of genes for functional studies, 
for building disease-specific arrays, for looking at genes com- 
mon to multiple diseases and various other high-throughput 
applications. In the future. It will be possible to enhance the 
utility of the MedCene database by building links between 
genes and other MeSH terms as well as other biological 
processes and concepts, such as cell division and responses to 
small molecules. 
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